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Vectors and matrices 


Introduction 


Introduction 


This book is largely concerned with matrices, i.e. rectangular arrays of 
numbers or other elements. Such arrays have many applications and are an 
essential tool in many fields of study, including physics, engineering and of 
course mathematics itself. 


The book consists of three units. The first, Unit 4, concerns vectors and 
matrices. It starts with a review of vectors — a mainly geometric concept 
that you should have already encountered in your earlier studies. It then 
introduces matrices and shows how matrices that take the form of a single 
row or column can be used to represent a vector. Having formed this link 
between its two main subjects, the unit goes on to show how square 
matrices can be used to represent geometric transformations, such as 
stretching (dilation) or turning (rotation). From this it develops the 
general rules for adding and multiplying matrices, and hence provides the 
foundations of matrix algebra. 


The remaining two units, Units 5 and 6, continue the discussion of 
matrices. Unit 5 concentrates particularly on the use of matrices in the 
treatment of systems of simultaneous linear equations, and thus leads to 
the subject known as linear algebra. Unit 6 extends this work by focusing 
on the treatment of systems of linear differential equations. 


Studying these three units in sequence will teach you a great deal about 
matrices, and will especially emphasise the significance of eigenvalues and 
eigenvectors, two concepts that will be mentioned several times and which 
play an important part in many different applications. 


Study guide 


Section 1 reviews simple features of scalar and vector quantities in two and 
three dimensions. Much of this should be familiar to you from previous 
study. If so, feel free to skim the text, but make sure that you do the 
exercises. 


Section 2 explains two important ways of forming a product from two 
vectors a and b. The scalar product a+b produces a scalar quantity. The 
vector product a X b produces a vector, perpendicular to the plane 
containing a and b. Each kind of product is of great utility in the physical 
sciences. 


Section 3 introduces matrices. It relates vectors and matrices, and 
considers the use of square matrices to represent geometric transformations 
of a plane. This leads to a discussion of the multiplication of matrices and 
hence to some simple examples of matrix algebra. 


Section 4 provides more practice in matrix multiplication and gives the 
general rules of matrix algebra, applicable to matrices of any size. In 
addition, it provides methods for calculating entities known as 
determinants and inverses of matrices. It also revisits the scalar and vector 
products of Section 2, using matrix notation, thereby emphasising the 
ability of matrix methods to encapsulate and simplify important results. 
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1.1 Indicating and representing vectors 


One way to distinguish vector quantities from scalar quantities is as 
follows. 


A scalar quantity is one that can be specified by a single number or 
by the combination of a number and a unit of measurement. 


A vector quantity is one that requires both a magnitude and a 
direction for its complete specification. 


Examples of scalar quantities include the number 7 = 3.1415..., a mass of 
5kg, a distance of 2.5m, a speed of 3.0 x 107° ms™!, and a temperature of 
—30°C. Those scalars that can be described by numbers alone, such as the 
number of peas in a pod, are called numerical quantities. 


A common example of a vector quantity is displacement. The 
displacement from London to Brighton is (approximately) 74km due 
South, and the (approximate) displacement from Milton Keynes to Oxford 
is 47km South-West. Note that the specification of a displacement involves 
a magnitude (in this case given by a distance in km) and a direction (in 
this case given as a compass bearing). The magnitude of a vector quantity 
is not allowed to be negative. It is correct to say that the displacement 
from Brighton to London is 74km North, but it is not correct to describe 
that displacement as —74km South. 


Figure 1 represents part of a plane that includes the points A, B, C 

S1 and D. The displacement from A to B is indicated by an arrow. That is 
appropriate since an arrow has a magnitude (its length expressed in 
appropriate units) and a direction (its orientation). 


B A symbol that is conventionally used to represent the displacement from A 


to B is AB. However, an even more common convention is the use of a 
bold letter, traditionally s, to represent a displacement. This too can be 


C seen in Figure 1, where the displacement AB is represented by the 
6 So symbol s;, and the displacement CD is represented by sg. 
Displacement, specified by a direction and a distance, is not the only 
vector quantity. Other examples include velocity, which is specified by a 
direction and a speed, and force, which is specified by a direction and a 


force strength. Note that distance, speed and force strength are each 
non-negative quantities. 


Figure 1 Displacement 
vectors represented by arrows 
may be indicated by symbols 
such as AB or S$} 
Throughout this module, symbols representing vectors will generally be 
printed in bold. So, for example, we may indicate a force by F and a 
velocity by v. When writing by hand, the same indication is given by 
underlining the symbol, e.g. v. 
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Underlining symbols that represent vectors 


It is very important to underline handwritten symbols that denote 
vectors. If you fail to do so, those reading your work may not be able 
to tell that you are referring to a vector, and you may be penalised. 


The magnitude of a vector, s say, is best handwritten as |s|. Sometimes in 
the text, when there is no possibility of ambiguity, we will simply display it 
as s. The magnitude of a vector is sometimes referred to as its modulus. 


Exercise 1 


The displacement from Brighton to Oxford is (approximately) 133 km 
North-West. Use this, together with the displacements given in the text 
above, to sketch a rough map showing the relative locations of London, 
Brighton, Oxford and Milton Keynes. According to your map, what is the 
approximate distance between Milton Keynes and London? How should 
this distance be described in terms of the displacement vector from London 
to Milton Keynes? 


1.2 Equating vectors 


Having reviewed the definition and representation of vectors, we can now 
begin to develop the algebra of vectors. This will occupy several 
subsections and will lead us into detailed considerations of the addition 
and multiplication of vectors. We begin, however, with the fundamental 
idea of what it means to say that two vectors are equal. 


Recalling that a vector quantity is completely specified by its direction and 
magnitude, we have the following definition. 


A 

Two vectors are equal if they have the same direction and the same 

magnitude. S1 
Figure 2 is similar to Figure 1 apart from an extra point F and a new 
displacement vector, s3, that stretches from B to E. The direction and B 
distance from B to E is the same as that from C' to D, so the displacement 53 
s3 is equal to the displacement s2, and we can write s2 = s3. E 
Note that the different starting points of sz and sg in Figure 2 do not C 
prevent the displacements from being equal. We may use points such as B wee 
and EF when we specify a displacement, but the relevant displacement, s3 D 


in this case, is completely specified by its direction and magnitude; it is 


not in any sense ‘tied’ to the particular points B and E. Figure 2. The displacement 


S2 is equal to the 
displacement s3 
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We do not usually associate a 
direction with the zero vector, as 
all zero vectors are equal. 


More generally, when we use arrows to represent any kind of vector 
quantity, the point at which we choose to draw the tail of the arrow is of 
no intrinsic significance. (Of course, the effect that a vector has when 
applied at one point may be quite different from the effect of applying the 
same vector at a different point, but that is irrelevant to the specification 
of the vector.) Thus, when considering vectors in their own right (rather 
than their effects), we are free to move the arrows that represent the 
vectors from one place to another, provided that we do not change their 
direction or magnitude. This is a principle that we will use in the next 
section, when we consider the scaling and addition of vectors. 


1.3. Scaling and adding vectors 


Scaling vectors 


The two arrows in Figure 3(a) represent vectors g and h. Both vectors 
point in the same direction and are therefore said to be parallel. The 
arrow representing h is twice as long as the arrow representing g, so we 
write h = 2g and say that the vector h is equal to g scaled by 2. More 
generally, if v is a vector and m is a positive scalar, then the scaled vector 
p = ™v is a vector in the same direction as v but with magnitude mlv]. 


Figure 3.) Multiplying a vector by a scalar 


We can also scale a vector by a negative value. If m is a negative scalar, 
the vector mv will still have a positive magnitude |m||v| (magnitudes are 
never negative), but mv will point in the opposite direction to v. Vectors 
that point in opposite directions are often said to be antiparallel. 


A special case of scaling occurs when m = —1. The negatively scaled 
vector (—1)v is normally written as —v. The two arrows in Figure 3(b) 
therefore represent the scaled vectors —g and —h = —2g. 


What happens when we scale a vector by zero (i.e. when m = 0)? The 
above definitions imply that the result is a vector of zero magnitude: this 
is called the zero vector, and is represented by the bold symbol 0. 


Collecting together the results of this subsection, we have the following. 
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Scaling a vector 


For any vector v and any scalar m, the result of scaling v by m is 
represented by the product mv and is the vector with magnitude 
|m||v| that is: 

e parallel to v if m > 0 

e antiparallel to v if m < 0 


e the zero vector 0 if m = 0. 


Adding vectors 


Consider two successive displacements indicated in Figure 4. The first, s1, R 
takes us from P to Q. The second, so, takes us from @ to R. The net ° 
result is described by the single displacement s, which takes us directly 

from P to R. In this sort of situation we interpret s as the result of adding 
the displacements s; and sz, so we write S2 


P 


S1 
S=s,+S82 or PR=PO+OR. 
Note that in adding the displacements we have had to take the directions Q 
of s; and sg into account; we were not able simply to add their Figure 4 Two successive 
magnitudes. To emphasise this we refer to the general process of adding displacements s; and sg, and 
vectors as vector addition, and we call the outcome the resultant. their net result, s 


In the simple case of vector addition considered above, we could determine 
the resultant by using a triangular diagram. This graphical approach 
provides the basis of a more general triangle rule that we can use to add 
any two vectors of the same physical type — two displacements say, or two 
velocities. The triangle rule takes into account our freedom to locate the 
tail of a vector arrow wherever we want; it can be stated as follows. 


The triangle rule 


To add a vector a to a vector b of the same physical type, first draw 

an arrow to represent a, then draw an arrow to represent b so that its 
tail is coincident with the head of the arrow representing a. An arrow 
drawn from the tail of a to the head of b then represents the resultant 
of the vector addition a+b (see Figure 5). a 


Figure 5 The triangle rule 
Combining this interpretation of vector addition with what has already for vector addition 
been said about scaling a vector allows us to make sense of vector 
subtraction, since we can write 


a—b=a+(-l)b. 


As you can see, the expression on the right is just the vector sum of a and 
the negatively scaled vector —b (which is antiparallel to b). A trivial case 
occurs when a = b, which gives the very natural-looking vector equation 


a—a=0O. 
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The definition of the sum of two vectors is readily extended to give the 
sum of many vectors, a+ b+c+---. We simply use the triangle rule to 
find the sum s = a+b, and then use it again to add c to s, and we keep 
on going in this way until all the vectors in the sum have been added. 
Figure 6 illustrates the process. In effect, the arrows are strung together in 
a chain, head to tail, with the head of one arrow coincident with the tail of 
a the next. The arrow representing the final resultant vector is then 
obtained by joining the tail of the first arrow to the head of the last. 


a+b+c+d 


Figure 6 Extending the 
triangle rule to more than It is clear from the triangle rule that a+ b = b +a, and it follows that 
when adding (and subtracting) several vectors, the order in which we carry 
out the various additions and subtractions has no influence on the final 
result. We describe this by saying that vector addition is commutative. 


two vectors 


Fi 


F 


is equivalent to 
F 
pS—_ — —_— 
provided that 
Fi F2 


F 


Figure 7 Forces F; and F2 
are equivalent to the 
resultant F 


Example 1 


The vector a points to the East and has magnitude 3. The vector b points 
to the North and has magnitude 2. 


(a) Draw a diagram (with North at the top and East to the right) showing 
arrows representing a, 2b and c= a+ 2b. 


(b) What is the magnitude of the vector c? 


(c) What is the angle between the direction of c and the direction of a? 
Specify your answer as an angle between 0 and 7/2 radians. 
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Solution 


(a) An appropriate diagram is shown in Figure 8. 


Figure 8 


The arrow for a is 3 units long. The arrow for b is 2 units long, so the 
arrow for 2b is 4 units long. The arrows for a and 2b are mutually 
perpendicular. The arrow for c is then as shown in the diagram. 


(b) The arrow for c forms the hypotenuse of a right-angled triangle. Using 
Pythagoras’s theorem, the magnitude of c is 


le] 473? + 4? = 5. 


(c) Let 6 be the angle between the directions of c and a. Then the 
diagram shows that tan @ = 4/3, so @ = arctan(4/3) = 0.927 radians. 


Exercise 2 


(a) Find the magnitude of the vector h = 2a — 3b, where a and b are as 
defined in Example 1. 


(b) What is the angle between the direction of h and the direction of 2a? 
Specify your answer as an angle between 0 and 7/2 radians. 


1.4 Cartesian components and basic vector algebra 


Despite the usefulness of the triangle rule, graphical methods are not 
generally very accurate and are difficult to apply in three dimensions. For 
these reasons, this subsection will introduce methods based on the use of 
Cartesian components of vectors that will enable us to define the equality, 
scaling and addition of vectors in algebraic terms. We start with a review 
of Cartesian coordinates. 
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Cartesian coordinates are named 
in honour of philosopher and 
mathematician René Descartes 
(1596-1650), who pioneered the 
geometric use of coordinates in 
his 1637 book La Géométrie. 
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Cartesian coordinate systems 


Figure 9(a) shows a three-dimensional system of Cartesian coordinates. 
Such a system consists of three mutually perpendicular axes that meet at 
a point O called the origin. The axes are conventionally labelled xz, y 

and z, with the z-axis oriented vertically. Each axis is given a positive 
sense, as indicated by the single arrowhead drawn on that axis. This gives 
a unique meaning to terms such as positive x-direction (i.e. parallel to the 
arrowed direction on the z-axis), and negative z-direction (i.e. antiparallel 
to the arrowed direction on the z-axis). 


Zh ZA 


(a) (b) 


Figure 9 (a) A three-dimensional Cartesian coordinate system with 
origin at the point O. (b) The coordinates of any point P may be read 
from the axes, and described by an ordered triple (2, y, z). 


If number lines are drawn along each axis, with 0 at the origin and the 
numbers increasing in the positive direction, it becomes possible to 
associate a numerical value called a coordinate with every point on each 
axis. As indicated in Figure 9(b), any three values of the x-, y- and 
z-coordinates, such as 7 = 3, y = —2 and z = —1, will then determine a 
unique point in space, such as P. Moreover, every point in 
three-dimensional space will correspond to a unique set of the three 
coordinate values. (That’s what we mean by saying that space has three 
dimensions.) We can now agree to indicate any specified point by 
presenting its three coordinate values © = x71, y = y, and z = z, as an 
ordered triple of values (21, y1, 21), always giving the three coordinates 
in the same conventional order, separated by commas and enclosed in 
round brackets. Using that convention, we can represent the point P by 
(3, —2,—-1) and say that the origin O is at the point (0,0,0). A similar 
system may be applied to two dimensions by simply omitting the third 
coordinate and representing a point by an ordered pair of values such as 
(v1, y1)- 


In many physical situations it is necessary to associate coordinates with 
measured distances in specified directions. This can be done by 
multiplying each coordinate by an appropriate unit of measurement, such 
as the metre (m). 


1 Vectors 


Right-handed systems of coordinates 


When working in three dimensions there are two fundamentally different 
ways of arranging three mutually perpendicular axes. The resulting 
systems of Cartesian coordinates are described as right-handed systems 
and left-handed systems; it is important to know how to tell them apart 
since, by convention, we use only right-handed systems unless there is a 
very good reason to do otherwise. Figure 10 shows a right-handed 
system of coordinates, and its caption describes the right-hand rule that 
can be used to distinguish such a system from its left-handed counterpart. 


(a) . (b) . 


Figure 10 ‘The handedness of a given Cartesian system can be checked 
using the right-hand rule. (a) Point the straightened fingers of your 
right hand in the direction of the positive x-axis, and rotate your wrist 
until you find that you can bend your fingers in the direction of the 
positive y-axis. (b) Extend the thumb of your right hand. If it points in 
the direction of the positive z-axis, the frame is right-handed. (If your 
thumb points in the negative z-direction, the system is left-handed.) 


Using right-handed systems of Cartesian coordinates 


When using Cartesian coordinates in three dimensions, it is Conventionally oriented 


conventional to work with right-handed systems. two-dimensional axes are also 
described as right-handed. 


Exercise 3 


Imagine that you are standing with your feet at the origin of a 
three-dimensional system of Cartesian coordinates. Your head is at some 
point on the positive z-axis, and you are looking into the region between 
the positive x-axis and the positive y-axis. Suppose that the system is 
right-handed and the positive z-axis is on your right. What will be on your 
left — the positive y-axis or the negative y-axis? 
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Exercise 4 


Which of the sets of perpendicular axes in the figure below define 


right-handed coordinate systems? 
Y 
e 
Zz 
) 


y y | 
va - 2 J y 
(c) ( 


(a) (b) c d 


The x-axis points out of the plane of the page in (a) and (b). The z-axis 
points respectively into and out of the plane of the page in (c) and (d). 


Cartesian unit vectors 


When working with vectors in a region described by a right-handed system 
of Cartesian coordinates, it is often helpful to introduce a set of three 
special vectors, one directed parallel to each positive coordinate axis, and 
each having magnitude 1. These three vectors are called Cartesian unit 
vectors and are illustrated in Figure 11; they are conventionally labelled 
i in the positive x-direction, j in the positive y-direction, and k in the 
positive z-direction. For neatness, i, j and k have been drawn along the 
axes, with their tails at the origin but, as you know, that is not essential: 
the tails can be placed anywhere because the vectors are completely 
specified by their direction and magnitude. Note that the magnitude of a 
Figure 11 The unit vectors unit vector really is the numerical quantity 1; there are no units. 


i, j and k each have : . : 4 
erie de 1 and respectively The great merit of unit vectors is that they can be easily scaled to produce 


point in the positive z-, y- vectors of any desired magnitude in any of the positive or negative 

and z-directions coordinate directions. For example, 6i is a vector in the positive 
x-direction with magnitude |6] |i] = 6 x 1 = 6. Similarly, —3k is a vector in 
the negative z-direction with magnitude |—3| |k| = 3 x 1 = 3. More 
generally, if \ is a non-zero scalar, the direction of Aj will depend on 
whether is greater than zero or less than zero; in either case, the 
magnitude of Aj will be just |). 


Even in those cases where we associate coordinates with physical lengths 
by multiplying them by units such as the metre, it is still the case that 
unit vectors have magnitude 1, not 1 metre. So, for example, for the 
displacement vector s = (2.5m) i, we say that its magnitude is 

|s| = |(2.5 m) i] = |2.5 m| |i] = 2.5m x 1 = 2.5m. 
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Cartesian components 


Now, the crucial observation is that in three dimensions any vector can be 
represented by a linear combination of the unit vectors i, j and k. This is 
indicated in Figure 12 for a vector a, which is depicted as the vector sum 
of the three mutually perpendicular vectors a;zi, a,j and a,k. 
Algebraically, we can write this sum as 


a= @zi+ a,j + ak. 


This is called the Cartesian component form of a. The three vectors 
azi , dyj and a,k are called the Cartesian component vectors of a. 
However, each of those component vectors is itself the result of multiplying 
a Cartesian unit vector by a scalar. The three scalars involved, az, dy 

and a,, are called the Cartesian scalar components or simply 
Cartesian components of a. In Figure 12 each Cartesian scalar 
component is positive, but in the general case each may be positive, zero 
or negative. Scalar components are referred to more frequently than 
component vectors, so any general references to ‘components’ or even 
‘Cartesian components’ should always be interpreted as ‘Cartesian scalar 
components’ unless there is a clear indication to the contrary. With this in 
mind we will usually refer to az as the x-component of a, ay as the 
y-component, and a, as the z-component. 


In three dimensions, a vector can often be most conveniently specified in 
terms of its components. For a vector a this might be done using the linear 
combination azi+ a,j + azk or an ordered triple (az, ay, az). In either 
case, the vector is said to be in component form. We thus have two 
equivalent ways of writing a vector in component form. 


B= Oleh = Oly) ae Ol (1) 
or equivalently, 


a (7107105) (2) 


If a vector a has a known direction and a known magnitude a, then we can 
use trigonometry to determine its scalar components. The general 
procedure is illustrated in Figure 13 for the case of the x-component. It 
shows that 


dy =acos6,, where 0 < 6, <7, (3) 


where 6, is the angle between the direction of a and the positive 
x-direction. Similar formulas, with the analogous angles @, and @,, give the 
y- and z-components ay and az. 
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Figure 12 The vector a is 
the vector sum of the three 
mutually perpendicular 

component vectors azi, dyj 


and a-k 


Note that the ordered triple 
notation for a vector is identical 
to that for the coordinates of a 
point. 


x 


Figure 13. Finding the 


x-component of a vector 
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[2 2 ‘ 
az + ay 


Figure 14 Finding the 
magnitude of a vector with 
known components 


mS 


w 


1 2% 4 52 
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Conversely, if we know the component form of a vector a, so the values of 
az, dy and az are known, then it is easy to determine the magnitude and 
direction of a. Using Pythagoras’s theorem twice, as in Figure 14, shows 
that the magnitude of a is given by 


a= |al = ,/a2 + a2 + a2. (4) 


Substituting this result into equation (3) and rearranging shows that the 
angle 0, between a and the z-axis is given by 


cos 6, = oc i ss where 0 <0 a (5) 


b) 
4/ a2 + az + a2 


Again, similar results will apply in the y- and z-directions, giving 
analogous expressions for cos 6, and cos @,. These three cosines will 
uniquely determine the direction of any non-zero vector. 


Incidentally, you may have noticed that in Figures 12, 13 and 14, the 
vector a has been drawn with its tail at the origin, but this, of course, is 
not essential. The results that we have quoted depend only on the 
components of vectors and not on the point that we have chosen to 
represent the origin of our coordinate system. 


We should also note that a common way of specifying a point with 
coordinates (x,y,z) is in terms of a displacement from the origin of a 
Cartesian coordinate system to the point. Such a vector is referred to as 
the position vector of the given point and is usually represented by the 
symbol r. The components of the position vector are rz = @, Ty = y and 
’, = Z, SO we may write r = zi+ yj + zk or equivalently r = (z,y, z). As 
far as addition and scaling are concerned, position vectors may be treated 
in the same way as any other displacements. 


Exercise 5 


The figure in the margin shows two vectors, a and b, in a two-dimensional 
Cartesian coordinate system. In this system the components of a and b 
happen to be integers. 


(a) Determine the components of a and b by visual inspection, then 
express each vector as a linear combination of unit vectors and as an 
ordered pair of scalar components. 


(b) Use the components of a and b to determine their magnitudes, and for 
each vector find the angle between the direction of the vector and the 
positive x-direction. 


1 Vectors 


Unit vectors in other directions 


Given a vector a, it is often necessary to construct a unit vector in the 

same direction, as indicated in Figure 15. Such a general unit vector is 

usually denoted by a; it will have magnitude 1, and is obtained by dividing a 

the non-zero vector a by its own magnitude. Thus 
a 


a=) (6) 


~) 


The vector a is just a scaled by 1/|a|, so a lies in the same direction as a 
and has magnitude 


> 
x 


1 
—a 


lal = 
lal 


1 
~ Jal elo Figure 15 A unit vector a in 


: gon : the direction of vector a 
The x-component of the unit vector a is given by cos @,, the quantity 


described by equation (5). In fact, using its component form, we can write 
the unit vector in the direction of a as 


a or ee Gig Dy Gl 
a = (Gz, Gy, Gz) = (cos Oz, cos Ay, cos Oz) = (=, = ==) : (7) 


We see that the information contained in @ is just the direction of a. 


As a simple illustration of this, suppose that a = (1,2,0). It then follows 
from equation (4) that a = V1? + 2? = 5, so the unit vector in the 
direction of a is A = (1/75, 2/V/5, 0). 


Basic vector algebra with Cartesian components 


Vectors were introduced earlier as essentially geometric entities, and 
actions such as equating, scaling and adding vectors were all introduced in 
geometric terms. Now, however, following the introduction of components, 
we can give each of these actions an algebraic interpretation in terms of 
components. So, for example, we already know that two vectors 

a= (az, ay,az) and b = (bz, by, bz) will be equal if they have the same 
direction and magnitude, but we also know that the direction and 
magnitude of a vector are determined by the vector’s components. 
Consequently, the two vectors will be equal if their corresponding 
components are equal, so a necessary and sufficient condition for a = b is 
that 


Gz =bz, Ay=by, az = by. 


Similarly, the scaling of the vector a by the scalar \ to produce the vector 
Aa can be interpreted as the multiplication of each component of a by X. 
So the operation of scaling is represented algebraically by the relation 


Aa = A(Az, Gy, Gz) = (Aag, Ady, Aaz). 


In a similar way, the vector sum a+ b that was introduced geometrically 
using the triangle rule can be reinterpreted in terms of the sum of 
corresponding components. So given a = (dz, @y,@z) and b = (bz, by, bz), 
we can say that their vector sum is given by 


a+ b= (ag+ by, ay + by, az + b,). 


15 


Unit 4 Vectors and matrices 


Figure 16 A 
component-based 
reinterpretation (in two 
dimensions) of the basic 
operations of vector algebra: 
(a) the equivalence of two 
vectors; (b) the scaling of a 
vector by a scalar; (c) the 
addition of two vectors 
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These algebraic reinterpretations are indicated graphically (in two 
dimensions) in Figure 16, but the transition from geometry to algebra that 
they constitute is of such significance that we also reproduce their 
three-dimensional versions in the following summary. 


Basic vector algebra in terms of Cartesian components 


e Given the component form of a vector a = (az, dy, @z), OF 
equivalently, a = azi + a,j + azk, the magnitude of a is given by 


a= |a| = ,/a2 + a2 + a2, (8) 


and the direction of a can be indicated by the unit vector 


ial = (=, = = — (cos 0, cos0,,,.cosu,). (9) 


a= 


e Given also a second vector b = (bz, by, bz), or equivalently, 
b = b,i + b,j + bzk, the vectors a and b are said to be equal, 
and we write a = b, when 


Oy =e, Oy = Oy Bn = (10) 
e The scaling of vector a by the scalar \ produces the scaled vector 
Na — (AGE Ay, AG, ) (11) 
or equivalently, 
Aa = Adzi+ Aayj + Aazk. (12) 
e The vector addition of the vectors a and b produces the 
resultant 
BLE By Sie ab Oops Oty SE Wa, Oy SE Oe) (13) 


or equivalently, 


SSE OSs Oke sey Ml a= (ey aE Oy ae ig AB Da) (14) 


These first steps in vector algebra naturally suggest that we can go further, 
based on the further exploitation of Cartesian components. We will do this 
in the next section when we discuss two extremely useful ways of forming 
products of vectors. 


Example 2 

Let a=i+j+k, b= 2i—3j—k andc=3i+ k. 

(a) Express d = 2a — 3b and e = a— 2b + 4c in component form. 
(b) Find the magnitudes of the vectors d and e. 

(c) Evaluate |a|, and write down a unit vector in the direction of a. 
( 


d) Find the components of a vector g such that a+ g = b. 


Solution 
(a) d= 2(i+j+k) — 3(2i— 3j —k) = —4i + 11j + 5k, 
e = (i+ j+k) — 2(2i— 3j —k) + 4(3i+ k) = 91+ 7j + 7k. 
(b) Using equation (8), 
|d| = /(—4)? + 112 + 5? = V162 = 9v2, 
lel = /92 + 72+ 72 = V179. 
(c) jal =VP +242 = V3. 


Using equation (9), a unit vector in the direction of a is 


~ a 1 
a= — = —(i+j+k). 
jal V3 
(d) Ifa+g=b, then 
g = b—a= (2i— 3j —k) —- (i+ j+k) 
= ij — 4j — 2k. 


Thus the scalar components of g are gz = 1, gy = —4 and g, = —2. 


Exercise 6 

Let a = 2i—j, b=i+3j+5k and c =j — 2k. 

(a) Find the magnitudes of a and b. 

(b) Find the values of 6,, 0, and @., giving the direction of a. 

(c) Find the vectors a+b, 2a — b and c + 2b — 3a in component form. 
( 


d) For the displacement vector PO = 2a —b, where the point P is 
(0, 2,3), find the endpoint Q. 


(e) For the displacement vector RS = a+ 2b, where the point RF is 
(1,1,0), find the endpoint S. 


Exercise 7 


Confirm that the unit vector 


does indeed have magnitude 1. 


Vector equation of a straight line 


One useful application of position vectors (in two or three dimensions) is in 
obtaining a vector equation of a straight line. 


1 Vectors 
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YA 
P 
R 
. Q 
O x 
Figure 17 


If0 <t< 1, then the equation 
represents only the line 
segment PQ. 
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Example 3 


Find the position vector of a point R lying on the straight-line 
segment PQ (see Figure 17) in terms of the position vectors of P and Q. 


Solution 


Using the triangle rule, the position vector OR can be written as 


OR = OP + PR. 


Now PR = tPO for some number t, and the point R traces out the line 
segment PQ as t varies from 0 to 1. Thus the straight-line segment PQ is 
described by the vector equation 


OR=OP+tPO (0<t<)). 


Writing p = OP, q= OO, r= OR, and noting (using the triangle rule) 
that PO = 06 — OP =q-—p, this equation can also be written as 


r=pt+t(q-—p)=(1-t)pttq (O<t<1). 


Note that if the parameter t in Example 3 is allowed to range over all the 
real numbers (—co < t < oo), then the point R traces out the entire 
straight line of which PQ is a segment. Also note that the ideas in 
Example 3 are easily extended to three dimensions. 


Vector equation of a straight line 


If P and Q are any two distinct points on a straight line in space, 
with position vectors p and q, respectively, then the vector 
equation of the straight line is 


r(t)=p+t(q—p)=(1—-t)pt+tq (-w«<t<ov), (15) 


where r(t) represents the position vector of any point on the line. 


Exercise 8 


Write down, in component form, the vector equation of the straight line on 
which lie the points with Cartesian coordinates (1,1,2) and (2,3, 1). 


Vector-valued functions 


Recall that a real-valued function f(t) is an entity that gives a real value 
for each value of the variable t. The vector equation of a straight line 
introduced above, equation (15), is an example of something called a 
vector-valued function, i.e. an entity r(t) that gives a vector for each 


value of the variable t. The components of the straight line r(t) in the 
solution to Exercise 8 were linear functions of t: r(t) = (1+t,1+4 2t,2—-t). 
More generally, the components of some curve in space will be 

r(t) = (x(t), y(t), z(t)), where x(t), y(t) and z(t) are some general 
real-valued functions of t (Figure 18). 


Suppose that the position of a particle moving along some curved path is 
given by r(t) = (a(t), y(t), z(t)) at time ¢t. We want to find the velocity and 
acceleration of the particle, given by the derivatives of r(t) with respect 

to t. These are obtained by differentiating the components. 


Example 4 


A particle has position r(t) = (3t? — 2,¢4,-t +1) at time t. Find its 
velocity. 


Solution 


The velocity of the particle is given by the derivative of r(t) with respect 
to t: 


d 
v(t) =r(t) = Gq 8t =9 ti, 47 1), 


Exercise 9 


Find the acceleration a = v of the particle of Example 4. 


2 Products of vectors 


There are two very useful ways of forming the product of two vectors a 
and b. The first method produces a scalar quantity, represented by a:b, 
and is called the scalar product or the dot product of the two vectors. The 
second method produces a vector quantity, represented by a X b, and is 
called the vector product or the cross product of the two vectors. 


We discuss the two products in turn, starting with the scalar product. In 
each case we start with a geometric view that emphasises directions and 
magnitudes, just as we did when defining the scaling and addition of 
vectors. However, we very quickly go on to express each product 
algebraically, in terms of components, and to examine its characteristic 
properties and applications. 


2 Products of vectors 


Figure 18 A curve r(t) in 
two-dimensional space. The 
vectors r(t,) and r(t2) 
indicate the values of r(t) at 
two values of the independent 
variable t. 


19 


Unit 4 Vectors and matrices 


a 


Figure 19 The angle 6 
(0 <0 < 7) between two 


vectors 


The product a- b is read as 

‘a dot b’, and for this reason is 
often referred to as the dot 
product. 


This property does not obviously 
follow from equation (16), but 
will become obvious once we 
discuss the component form of 
the scalar product. 
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2.1 The scalar product 


Consider two (non-zero) vectors a and b. No matter how a and b are 
specified (they might be given as displacements between particular points, 
or the velocities of particular objects), we can always use our freedom to 
move the arrows that represent vectors parallel to themselves to ensure 
that their tails meet at a point. This makes it easy to visualise the angle 0 
between the directions of a and b, as indicated in Figure 19, and we can 
always take that angle to be in the range 0 < 0 < 7. Using @ we can define 
the scalar product geometrically, as follows. 


The scalar product of two vectors a and b is the scalar quantity 
a-b = |a||b| cos 9, (16) 


where 0 (0 < 0 < 7m) is the angle between the directions of a and b. 


The angle @ always lies in the range 0 < 0 < z, so the value of a- b is: 
e positive when @ is an acute angle (i.e. in the range 0 < 6 < 4) 
e negative when @ is an obtuse angle (i.e. in the range $ < 0 < 7) 


e zero when @ is a right angle (i.e. when 6 = 4). 


The last of these conditions tells us that if @ = 3, i.e. when a and b are 
perpendicular, then a- b = 0. The definition also implies that if 9 = 0, i.e. 


when a and b are parallel, then a- b = |a| |b] = ab. 


A special case worth remembering is that the scalar product of a with 
itself is just the square of the magnitude of a. 


Beet lalla (17) 


Note that the scalar product is a product in the mathematical sense, with 
a number of mathematically significant properties. For example, it is 
commutative so that 


a-b=b-a. (18) 
It also has the further properties of being distributive over addition, 
meaning that 

a-(b+c)=a-b+a-c, (19) 
and linear with respect to multiplication by a scalar A, so that 

(Aa) -b =a- (Ab) = X(a-b). (20) 


These properties allow us to make sense of scalar products that involve 
sums and brackets. 


Example 5 
Expand the expression x- y, given that x = 2u+v and y =u — 5v. 
Calculate its value when u and v are perpendicular unit vectors. 
Solution 
Using the mathematical properties of the scalar product, we can see that 
x-y = (2u+v)-(u—5v) 

= (2u) - (u— 5v) + v- (u— 5v) 

= (2u)-u+ (2u)- (—5v) + v-u4 v- (—5v) 

= 2(u-u) — 10(u-v) + v-u— 5(v- v) 

= 2(u- u) — 9(u- v) — 5(v- v). 
Now u:u= |u|? = 1 and v- v= |v|? = 1 when u and v are unit vectors. 


Furthermore, u- v = 0 when u and v are perpendicular vectors. So when 
u and v are perpendicular unit vectors, we have 


x-y=2-0-5=-3. 


Exercise 10 


Three vectors a, b and c of magnitudes 2, 4 and 1, respectively, lying in 
the same plane, are represented by arrows as shown in the figure below. 


b 


The angle between the vectors a and b is 3 radians, and that between the 
vectors b and c is & radians. Use the definition of the scalar product to 


find the values of a- b, b-c, a-c and b- b. 


Exercise 11 
(a) Expand the expression (a+ b)-+(a—b). 
(b) Expand the expression |a + b]?. 


(c) Write down the value of a-b, in terms of |a| and |b], when a and b 
are antiparallel. 


2 Products of vectors 


This solution is given in detail 
to show you there are no 
unexpected pitfalls when dealing 
with scalar products. The basic 
lesson is that the familiar rules 
of algebra still apply, so with 
practice you will not need to go 
through all these intermediate 
steps. 


Recall that |a|? =a-a. 
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The scalar product in terms of components 


Since i, j and k are mutually perpendicular unit vectors, the following 
useful relations must be true. 


Seek (21) 


and all other scalar products of Cartesian unit vectors (such as i- j) 
give zero. 


Consequently, using the usual rules of algebra, it can be seen that 
a:b = (azi + ayj + azk) + (bzi + byj + b-k) 
= azi+ (bgit byj + bk) + ayj- (bri + byj + bk) 
+ azk + (bgi + byj + bk) 
= Aybzi+ it agbyi+ j+ azbzi-k + aybzj-+it aybyj- jt ayb.j-k 
+ azbyk- i+ azbyk-j+azb,k-k 
= Agby + Ayby + azbz. 


This gives us the following very useful expression for the scalar product of 
two vectors in terms of their components. 


Component form of the scalar product 
If a = azi+ a,j + azk and b = bzi+ byj + b-k, then 
a-b = agbz + ayby + azbz. (22) 


Many other results follow from this. For example, the relation 
a-(b+c) =a-b+a-c that was stated earlier becomes easy to prove. It 
is also easy to confirm that the Cartesian scalar components of a vector a 
are given by 
dz =i-a, ady=j-a, a,=k-a, (23) 
and it follows from equations (16) and (22) that the angle between two 
non-zero vectors is given by 
a-b Azby + Ayby + azbz 
Cos 0 = eee: as 24 
|a| |b| ab , ) 
where 0 <@0< 7. 


As we are now using the scalar product in a more algebraic way, this is an 
appropriate point at which to note that the use of algebra has allowed 
mathematicians to generalise the idea of what constitutes a vector and, 
consequently, what constitutes a scalar product of vectors. Using 
components, it is easy to imagine extending the definitions given earlier to 
more than three dimensions, but the mathematical generalisations go well 
beyond this. As you will see in Unit 11, even functions may be treated as 
‘vectors’ in an appropriate space. In these generalised approaches, the 


analogue of the scalar product is often called the inner product, and the 
generalisation of the condition a- b = 0, the vanishing of the inner product 
of two (generalised) vectors, is referred to as the orthogonality condition. 
For this reason, even when dealing with ‘ordinary’ two- or 
three-dimensional vectors, you will often find that the terms 
perpendicular and orthogonal are used interchangeably. You will also 
find that the scalar product of two- or three-dimensional vectors is 
sometimes said to provide a test for orthogonality, since two non-zero 
vectors a and b are orthogonal (i.e. perpendicular, in this case) if and only 
ifa-b=0. 


Example 6 


Consider the vectors a = 2i— 3j + k and b = —i+ 2j+ 4k. Find the 
magnitudes of a and b, and the angle between them. 


Solution 

al = VES OPE - vi, 

|b] = /(-1)? + 22+ 4 = v21. 
However, from equation (22), 

a-b= (2x —1)4+ (-3 x 2) + (1 x 4) = —4, 
so if 6 is the angle between a and b, then 

a-b —4 4 

~ al[bl Vidx V2t 76 


The negative sign means that @ is obtuse, so 6 ~ 1.806 radians. 


Exercise 12 

(a) Ifa=4i+j—-—5k and b =i— 3j+k, show that a- b = —4. What 
does the negative sign tell you? 

(b) Are the vectors c = (3,5, —2) and d = (3, —1, —2) orthogonal? 


Exercise 13 


If p = 3i4+ 2j —k, q= —i+ j+ 2k and r = 2i—j —k, and \ is a scalar, 
find the value of A that makes p + Aq orthogonal to r. 


Resolving a vector into perpendicular components 


The process of splitting a vector into components in specified 
perpendicular directions is called resolution. So when we write a in 
component form as a = (dz, @y,az) Or a= dzi+ ayj + azk, we are showing 
how a may be resolved into its Cartesian components. 


2 Products of vectors 
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YA 


Wp 


Wn 
W 


Figure 20 The vectors Wp 
and Wy are respectively 
parallel and normal to the 
plane; since they are 
perpendicular, 

W = Wp + Wn 


Figure 21 Finding the 
scalar component of a in the 
direction of a unit vector U 
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However, it is often useful to be able to resolve a given vector into 
components along perpendicular directions that are not aligned with the 
Cartesian unit vectors. This subsection will show you how to do this. 
First, however, we give a physical perspective on the sort of situation in 
which the technique is useful. 


Resolving a vector: a physical perspective 


Figure 20 shows a box on a rough wooden plane inclined at an angle 0 
to the horizontal. The owner of the box wants scientific advice on the 
maximum angle @ that can be tolerated before the box starts to slide 

down the plane. 


The situation is described in a two-dimensional system of Cartesian 
coordinates with the x-axis pointing to the right and the y-axis 
pointing vertically upwards. In that system the weight of the box (the 
force exerted on the box by the Earth’s gravity) points vertically 
downwards and is described by the vector W = —Wj. 


We will not go into the details of the analysis, but what is crucial is 
the ability to express W as the sum of two orthogonal vectors, one 
pointing parallel to the plane (denoted Wp), the other directed at 
right angles (i.e. normal) to the plane (denoted Wy). We do this by 
resolving W in these directions. 


As a general case, suppose that an arbitrary vector a makes an angle @ 
with a unit vector U (see Figure 21). Denote the scalar component of a in 
the direction of U by a,. (Note that generally, a, may be positive or 
negative, depending on the size of 9.) Simple trigonometry then shows that 


dy =acos6 (0<@<n), 


but from the definition of the scalar product, and the fact that |u| = 1, we 
see that 


(<0 <7). 


U-a=acosé 


This implies the following result, which is true irrespective of the sign of ay. 


The scalar component of a in the direction of a unit vector U is 


(25) 


Gy, = Wo A. 


Of course, this is just a generalisation of equation (23), which showed that 
in the Cartesian case ay =i+a, ay =j-aanda,=k-a. Note that 
equation (25) can be used to find the ‘components’ (more generally called 
projections) of a in the direction of any three unit vectors U, V and w, but 
only when those unit vectors are mutually perpendicular can we say that 
a= Ay t+ dyV + dyW. 


2 Products of vectors 


Example 7 yA a 
Consider Figure 22, which shows a two-dimensional vector a = (1,3) and 
two mutually perpendicular unit vectors G = (1/V2,1/V/2) and 
$= (-1/V2,1/v2). 
(a) Show that a can be expressed in the form a = a, + a,V, with a, and 
dy given by equation (25). . ii 


(b) Calculate a, and ay, and hence express a as a linear combination of 


. ma mn Lf ia 
the unit vectors u and Vv. 


Figure 22 The vector 


Solution a = (1,3) and the unit 

(a) Since u and Vv are perpendicular unit vectors, we can certainly write vectors a = (=. 7A aad 
a=au-+4 Gv, for some values of a and 6. We can determine a and 8 7 a 
as follows. v= (-+. Te 


From equation (25), the components of a in the U and V directions are 


Hence a= a,U+ ayV, with a, and a, the components of a in the t 
and Vv directions, respectively. 


(b) We have 
dy =f-a= (= 45) - (1,8) = 4(4) = 2v2, 


2 
Sei ey = fp TN — 1/9) — 
dy =T-a ( 4, 53) (1,3) = (2) = v2. 
The vector a can therefore be written as the linear combination 


a= 2/204 V2¥9. 


Exercise 14 
Consider the vectors a = 2i— 3j+ k and b = —i+ 2j + 4k. 
(a) Which of the following vectors is perpendicular to a? 
c=-i+j+3k, d=-2i+k, e=-i-j-k. 
(b) Find the component of the vector a+ 2b in the direction of the 
displacement vector from the origin to the point (1, 1,1). 


(c) Find the component of the vector a+ 2b in the direction of the vector 
a— 2b. 


Exercise 15 
y 


A vector v has magnitude 4 and makes an angle of 7/3 with the negative Vv 
x-axis, as shown in the figure in the margin. Find the components of v in 
the i- and j-directions, and hence express v as a linear combination of i 
and j. 


lA 


&yv 
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The product a X b is read as 

‘a cross b’, and for this reason is 
often referred to as the cross 
product. 
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Exercise 16 


The three unit vectors 


1 1 
@ = —(1,0,1), @=—~(1,0,-1) and #=(0,1,0 
51,061) (1,0,-1) (0,1,0) 
are mutually perpendicular. Express the vector a = (2, 1,0) as a linear 
combination of these unit vectors. 


2.2 The vector product 


The vector product of two given vectors is a vector, whose direction is 
perpendicular to both the given vectors. It can be defined in geometric 
terms as follows. 


The vector product of two vectors a and b is the vector quantity 
a X b = ((a| |b/ sin @) n, (26) 


where 0 (0 < 0< =) is the angle between the directions of a and b, 
and n is a unit vector that is normal (i.e. perpendicular) to both a 
and b, and whose sense is given by the right-hand rule shown in 
Figure 23. 


Figure 23. The right-hand rule for vector products. To find the sense 
of the unit vector n that is normal to both a and b, first point the 
straightened fingers of your right hand in the direction of a. Then rotate 
your wrist until you find that you can bend your fingers in the direction 
of b. The outstretched thumb of your right hand then points in the sense 
of the unit vector n, which has the same direction as a X b. 


Notice that n is not defined if a and b are parallel (6 = 0) or antiparallel 
(9 = 7), or if a or b is the zero vector. In each of these cases, either 
sin @ = 0 or |a| = 0 or |b| = 0, so a X b= O, the zero vector. 


A special case worth remembering is that the vector product of a vector a 
with itself is the zero vector: ax a= 0. 


More generally, since the quantity n that determines the direction of a x b 
is a unit vector, it follows that the magnitude of a x b is simply given by 
|a| |b] sin 0. 


The vector product, like the scalar product, is distributive over vector 
addition, so we can say that 

aX (b+ c) =(ax b)+(axc). (27) 
It is also linear with respect to multiplication by a scalar , so that 

a X (Ab) = Aa X b. (28) 
These properties again allow us to expand expressions that involve sums 


and brackets in the usual way. 


However, unlike the scalar product, the vector product is not 
commutative. This means that for vectors that are non-zero and neither 
parallel nor antiparallel, a x b # b X a. The reason for this is the use of 
the right-hand rule to determine the sense of n. If we make b the first 
term in the product, the right-hand rule will tell us to reverse the sense 
of n, and that means changing the sign of the vector product, so 


axb=-bxXa. (29) 


This non-commutativity is a very important distinction that should always 
be kept in mind. 


Example 8 


Using the definition of the vector product, confirm that i x j =k, 
jx k=iandkxi=j. 
Solution 


Using the definition of the vector product (including the right-hand rule), 
and recalling that unit vectors have magnitude 1, 


ix j= (i |jlsmg)k—k. 
Similarly, 
jxk=i and kxi=j. 


Exercise 17 

(a) Confirm that j x i= —k,k x j=-—iandixk=-j. 
(b) State and justify the value of i x i, j x j and k x k. 
(c) Expand and simplify (i x (i+ k)) — ((i+j) x k). 

(d) Expand and simplify (i+ k) x (i+j+k). 


2 Products of vectors 


This result is not easily derived 
from equation (26). 
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Note that the correctness of 
these expressions is crucially 
dependent on the use of a 
right-handed system of 
coordinates. 


Figure 24 The cyclic basis 
of the vector product. The 
arrows indicate a positive 
sense; products formed in the 
reverse sense incur a minus 
sign. This provides an easy 
way of remembering that 

ix j=k but j x i= —k, and 
so on. 
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The vector product in terms of components 


The best way to get better insight into the vector product is to express it 
in terms of components. This will again mark an important transition 
from an approach that is primarily geometric to one that is more algebraic. 
Fundamental to this development are the results concerning the vector 
products of unit vectors that were discussed in the last subsection. 


ixj=k, jxk=i, kxi=j, (30) 


and all other vector products of pairs of Cartesian unit vectors give 0. 


Using these results, together with the familiar rules of algebra but taking 
care not to change the order of vectors in any vector product, it can be 
seen that 
a x b= (azi+ a,j +a,k) x (b,1+ b,j + 6k) 
= Aybgi X i+ aybyi X j + azbzi X k + aybzj X i+ aybyj X j 
+ aybzj X k+ azbzk X i+ azbyk xX j+azb.k xk 
= (aybe — agby)1 + (@zb, — azbz)j + (Ggby — @yby )k. 


This gives us two equivalent ways of expressing the vector product. 


Component form of the vector product 
If a = azi + ayj + azk and b = bzi + b,j + bk, then 

ACD — (0707 00, G0 10, 0s) 1 G,0) Ue (32) 
or equivalently, 


LOG DiC — Ul Ue — Crain, Oley — Une (33) 


Note the pattern in equations (32) and (33). The x-component of the 
product, c, say, is given by 


Cy = Aybz — azby, 


so the first three subscripts are x,y,z in alphabetical order. In both the 
terms on the right above, the subscripts are y and z, but in the second 
term (—a,b,) their order has reversed and the term has incurred an overall 
minus sign. Similar comments apply to each of the other components. In 
each case the first three subscripts are a cyclic rearrangement of x,y, z — 
i.e. either y,z,x or z,x,y (see Figure 24). Also, in each case the final term 
on the right involves a departure from cyclic reordering and incurs a minus 
sign as a result. Note that the z-component of a x b is independent of az 
and b,, the y-component is independent of a, and b,, and the z-component 
is independent of a, and by. 


2 Products of vectors 


Example 9 
Evaluate a x b, where a = (2,3,4) and b = (1, —1, —3). 
Solution 
Using equation (33), and working in ordered triple notation, 
a xX b= (2,3,4) x (1,-1, -3) 
= (3(-3) — 4(—1), 4(1) — 2(-3), 2(—1) — 3(1)) 
= (—5, 10, —5). 


Since a xX b should be perpendicular to both a and b, a quick check on our 
calculation is provided by verifying that (—5,10,—5)-a and 
(—5, 10, —5) - b both vanish. 


Despite their symmetry, equations (32) and (33) are not easy to remember 
and are mainly used in formal arguments and machine calculations. When 
it comes to calculations performed by hand, it is usual to employ a more 
memorable expression based on algebraic entities called determinants that 
will be described in Section 4 of this unit. For that reason we mainly defer 
exercises that require you to evaluate vector products until Section 4, 
where you will be able to use the determinant method. 


The vector product: a physical perspective T 


The vector product really opens up the world of three dimensions to 
physical scientists and engineers. For example, the turning effect of a P 
force is described by a vector quantity called torque that is defined by 
a vector product. Figure 25 shows a rigid rod with one end pivoted at O F 
the origin O and the other end at the point P with position vector 

r = (x,y,z). If a force F is applied to the rod at the point P, its (a) 

turning effect about the origin will be described by the torque 
T=r xX F. This will be true irrespective of the relative orientations 
of r and F. 


If F acts at right angles to r (see Figure 25(a)), then the torque about 
the origin will be in the direction oT perpendicular to r and F, and 
will have magnitude T’ = rF. If the rod is pivoted in such a way that 
it cannot rotate about an axis in the direction of T but must instead 
rotate about some other axis through the origin, then the turning 
effect of F will be described by the component of T along the allowed Figure 25 The torque T 


axis, whatever its direction. about the origin due to a 
force F applied at the 
point P with position 


Similarly, if the force F is applied at an angle @ to the rod (see 


Figure 25(b)), then the magnitude of its turning effect about an axis vector r: (a) when the force 
through the origin in the direction of T will be reduced to is at right angles to the rod; 
T =rFsin9, and will vanish completely if F is parallel or antiparallel (b) when the force is at an 


to r since @ will then be 0 or a. angle @ to the rod 


29 


Unit 4 Vectors and matrices 


Figure 26 An open part of 
the LHC shows the beam 
tubes and bending magnets 
for the electrically charged 
particles that are accelerated 
by the machine 


Figure 27 A parallelogram 


Figure 28 The 
perpendicular height of a 
parallelogram 
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As another example, consider an electrically charged particle 
travelling through the powerful magnetic field inside CERN’s Large 
Hadron Collider (LHC) (see Figure 26). Such a particle is subject to 
an electromagnetic force F that acts at right angles to both the 
particle’s velocity vector v and the magnetic field vector, which is 
represented by B. As the particle moves through the LHC, the 
direction of the electromagnetic force continuously changes. Yet no 
matter what the orientations of the particle’s velocity and the 
magnetic field, the force is at all times described by the equation 

F = qv X B, where q is the charge on the particle. 


Areas and vector products 


The vector product has several useful geometric applications. The 
following example introduces one of them. 


Example 10 


Any two non-zero vectors a and b define a parallelogram, as shown in 
Figure 27. Find an expression for the area of the parallelogram in terms of 
ax b. 


Solution 


The area A of the parallelogram defined by the two vectors a and b is 
equal to the product of its base length |a| and its perpendicular height 
h = |b|sin 6 (see Figure 28). Thus A = |a||b| sin 6, and this is the 
magnitude of a x b. So the area of the parallelogram is 


A=|ax bl. 


The area A of a parallelogram with sides defined by vectors a and b 
is given by 


A = base length x perpendicular height = |a x bl. (34) 


This result can be used to determine the areas of other figures, such as a 
rectangle (a special kind of parallelogram with a perpendicular to b) or a 
triangle, which has half the area of the corresponding parallelogram. 


Exercise 18 


Using position vectors, find the area of a triangle with corners at the 
points (0,0,0), (2,1,1) and (1, -1,-1). 


2 Products of vectors 


Exercise 19 


Using the vector product, confirm that the area of a parallelogram with 
corners at the points (0,0,0), (a,b,0), (c,d,0) and (a+c,b+d,0) is 

|ad — bc]. Check that the formula gives the right answer in the case that 
the parallelogram is a rectangle, with b= c= 0. 


Volumes and triple products 


A parallelepiped is a solid body like a distorted brick, all of whose faces 
are parallelograms, as shown in Figure 29. 


The volume V of a parallelepiped with sides defined by vectors a, b 
and c is given by 


V = base area x vertical height = |(a x b) - e|. (35) 


Figure 29 A parallelepiped: 
That this formula is correct can be seen as follows. The base is a in the case shown, the vector 
parallelogram defined by the vectors a and b. The area of the base is c leans i ue a ; 
therefore equal to the magnitude of a x b. The vertical height h is the ee ne ee “4 
magnitude of the scalar component of the vector c in the direction ar bi 
perpendicular to the base. That direction is parallel or antiparallel to the 
direction of a x b. So the product (a x b)-c has a magnitude equal to the 
base area times the perpendicular height. We may therefore express the 
volume of the parallelepiped as |(a x b) - e|. 


The expression for the volume of the parallelepiped involves the quantity 
(a X b)-c, which is an example of the scalar triple product of vectors. 
As the name implies, it is a scalar quantity obtained from three vectors. 
Such products arise in many situations and can be written in a number of 
equivalent ways. For example, by expressing all the products in terms of 
components, it is also possible to establish a cyclic identity according to 
which 


a-(b xc) =b-(c X a) =c:- (aX b). (36) 


Combining this with the freedom to exchange the order of vectors in the 
scalar product (but not in the vector product) leads to some more 
equivalent expressions, including the following relationship that will be 
used in a later unit: 


a-(bxc)=(axb)-c. (37) 
Proving the equality of these two expressions involves a lot of algebraic 
manipulation of their components. This is straightforward but time 
consuming. However, the equality of their magnitudes is obvious, since 


either of those magnitudes can be used to describe the volume of the 
parallelepiped with sides defined by the three vectors a, b and c. 
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The scalar triple product (in all its equivalent forms) is not the only way of 
forming a meaningful product of three vectors. There is also a vector 
triple product, (a x b) x c. This produces a vector quantity that will be 
perpendicular to c and to the vector that results from a X b. 


Since it involves taking vector products, the vector triple product is 
naturally not commutative. Moreover, the vector triple product is not 
associative either. That is, a X (b X c) is generally different from 

(a xX b) X c. This may seem rather surprising but is easily established as 
follows. Since (a X b) X ¢ is perpendicular to the direction of a x b, its 
only non-zero components must be in the plane that is perpendicular to 

a X b, ie. the plane containing a and b. Similarly, the only non-zero 
components of a X (b X c) must be in the plane containing b and c. So, 
provided that a, b and c are not all in the same plane, any non-zero triple 
products a X (b X c) and (a X b) X c must point in different directions. 


Exercise 20 


Suppose that a and b have a non-zero vector product, and that c is a 
non-zero vector such that (a x b) X c = 0. What can you say about the 
direction of c? 


We end our discussion of vectors with a warning. Although a powerful 
vector algebra has been developed with operations of scaling, addition and 
two kinds of multiplication, there is no vector division. So do not try to 
divide by a vector. The absence is caused by a lack of uniqueness in 
attempts to define vector division. When dealing with non-zero scalar 
quantities, the equation az = b has the unique solution « = b/a. When 
dealing with non-zero vectors, however, the corresponding equation 

a-x =), where 0 is a scalar, has many solutions. If x = x, is one solution 
(so a+ x; = b), then another solution is x2 = x; + Ac, where 4 is any scalar 
and c is any vector orthogonal to a (so a+ (Ac) = 0, and hence 
a-xX2>—a-xXj =p). 


3 Matrices, vectors and linear 
transformations 


In Sections 1 and 2 of this unit you were introduced to the algebra of 
vectors. Sections 3 and 4 will provide a comparable introduction to the 
algebra of matrices. As you will see, there are many deep links between 
vectors and matrices, but there are also some important differences. In 
particular, the treatment of matrices is generally more ‘algebraic’ and less 
‘geometric’. This can make matrix algebra appear more abstract and 
harder to visualise. For that reason, rather than plunging directly into the 
presentation of algebraic rules for scaling, adding and multiplying 


3 Matrices, vectors and linear transformations 


matrices, most of Section 3 will be devoted to examining a particular 
application of matrices that emphasises their use in geometry. This is the 
subject of linear transformations of a plane, which will be introduced 
below, alongside matrices themselves. This will allow us to introduce some 
of the basic ideas of matrix algebra in a concrete setting. The more general 
aspects of matrix algebra will then be introduced in Section 4. 


3.1 Matrices and linear transformations 


Matrices 


Here are four rectangular arrays of numbers enclosed in brackets. 


—2 1 4 0.8 

2 -7 
(a) (b) 3.0 -1|] (c) |-7 az} (d) J03 
E s 23 2 | : 0.6 


These are all examples of matrices. A matrix can be defined as follows. 


A matrix is a rectangular array of elements (usually numbers or 
physical quantities) arranged in rows and columns, and enclosed in 
brackets. It obeys several mathematical rules that collectively 
comprise matrix algebra. 


A matrix with m rows and n columns is said to be of order m x n. We 
generally represent entire matrices by symbols printed in bold type, so if A 
represents an m X n matrix, we may write 


ai1 a2 Gin 
a21 a22 a2n 

A=|. 4 (38) 
oo Am2 na 


where a,j; represents the element in the ith row and the jth column. (‘The 
significance of the symbol a;; is sometimes recalled using the mnemonic 
‘arc’: element a, row 7, column j.) 


A matrix in which all of the elements are zero is called a zero matrix and 
will be denoted 0, irrespective of its order. Matrices of order n x n have 
equal numbers of rows and columns and are called square matrices. 
(Examples (a) and (b) given above are square matrices.) Any matrix of 
order 1 x n takes the form of a single row of elements and is called a row 
matrix. Similarly, a matrix of order n x 1 is called a column matrix. 
(Examples (c) and (d) given above are row and column matrices, 
respectively.) If n = 2 or 3, a row matrix looks a lot like a vector presented 
as an ordered pair or an ordered triple, though the matrices do not contain 
commas. In fact, row and column matrices are often used to represent 
vectors algebraically. For that reason we often refer to row and column 
matrices as row vectors and column vectors. 


We follow the convention of 
using square brackets to indicate 
matrices. Some texts use round 
brackets. 


m X n is read as ‘m by n’; the x 
does not mean multiplication. 


There is no universal convention 
on how to hand-write matrices. 
Capital letters are often used, 
except when the matrix 
represents a vector. Some people 
underline with a straight line, 
some with a curly line; some 
even underline twice. We leave 
it to you to choose a convention. 
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Figure 30 A plane and an 
overlying coordinate system. 
A square edged by unit 
vectors and a grid of straight 
lines have been drawn on the 
plane. Distortions of the 
plane do not affect the 
coordinate system. 
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Two matrices are said to be equal if they have the same order and each of 
the corresponding elements is equal. So, for example, 

[3 8] a [1 +2 4x 2|., but [3 8| # [3 8 0| because the last matrix is 
not of order 1 x 2. More formally, we have the following. 


Two matrices A and B are equal if they have the same order and 
aij = bj for alli =1,...,n and for all j =1,...,m. 


Exercise 21 


What is the order of each of the four example matrices (a), (b), (c) and (d) 
given at the start of this subsection? 


Linear transformations 


Figure 30 represents a two-dimensional plane overlaid by a 
two-dimensional Cartesian coordinate system. This subsection concerns 
‘transformations’ that affect the points in the plane but not the coordinate 
system. That’s why the coordinates were described as ‘overlaid’. You 
might like to think of the plane as an elastic sheet that can be rotated or 
stretched, while the coordinate axes are drawn on an overlying transparent 
plastic sheet that is not affected by the distortion of the underlying plane. 


In Figure 30 we have drawn the (‘overlaid’) coordinate axes in black. We 
have also drawn a grid of straight (white) lines and a square of unit area 
with unit vectors along its edges. The grid lines and the vectors should be 
thought of as drawn on the plane (which is coloured grey), so they will all 
be affected by the transformation of the plane. 


Using the overlying coordinates, any point in the plane can be described 
by a position vector (x,y). Such a point can also be represented by a row 
vector or by a column vector; in the case of a column vector we call it the 
position column vector and denote it by 


<[] 


The usual Cartesian unit vectors on the plane can be described by (1,0) 
and (0,1). These too may be represented by appropriate row or column 
vectors; in the case of column vectors we call them unit column vectors, 
and denote them by 


i= A and j= HE 


(Note that by reusing the symbols i and j in this new way we are 
deliberately blurring the distinction between vectors and column matrices. 
It should be clear from the context which of them we are referring to, but 
the point is to demonstrate that matrices provide another way of dealing 
with vectors. ) 


3. Matrices, vectors and linear transformations 


Figure 31 shows the effect of subjecting the whole plane to a 
transformation that moves any point with position vector (x,y) to a new 
location with position vector (ax + by, ca + dy), where a, b, c and d are 
real numbers. The particular transformation shown in Figure 31 was 
obtained using the values a = 1.0, b = —0.9, c= 0.3 and d = 1.5, but it isa 
typical example of what is generally known as a linear transformation 
of the plane. Note that although the transformation generally tends to 
move points in the plane, and therefore changes their coordinates, it does 
not change the coordinates of the point at the origin. That particular 
point in the plane starts with the coordinates (x,y) = (0,0), and the effect 
of the transformation is to make its new coordinates 

(ax + by, cx + dy) = (0,0), so the origin does not move at all. This is 
characteristic of linear transformations. 


Our aim now is to show how linear transformations of the plane can be 
represented in a very natural way using matrices. As a first step towards 
this goal, we introduce the following multiplication rule for determining the 
product of a 2 x 2 square matrix and a 2 x 1 column matrix. You will see 
later that this is a special case of the general rule for matrix multiplication. 


The product of a 2 x 2 matrix A = E “ and a 2 x 1 matrix 


Xe | is a 2 x 1 matrix given by 
Dao 2 ae a oy 
Be = ° Fl = ee | ; Co) 


At first sight this rule may seem rather arbitrary, but there is actually a 
very sensible pattern behind it. As indicated by the hand symbols in 
Figure 32(a), the expression ax + by in the first row and first (and only) 
column of the product, is the result of adding the products of the 
corresponding elements in the first row of A and the first (and only) 
column of x. Similarly, as indicated in Figure 32(b), the expression cx + dy 
in the second row and first (and only) column of the product, is the result 
of adding the products of the corresponding elements in the second row 

of A and the first (and only) column of x. 


cela b] |x| — jax+by|-s a b| |x| _ jax+by 

c dl ly| |cx+dy cele a} ly|  |cxtdy}-<s 
(a) (b) 
Figure 32 (a) Obtaining az + by; (b) obtaining cx + dy 


If we now substitute into the matrix A the values a = 1.0, b = —0.9, 
c= 0.3 and d = 1.5 that characterise the transformation shown in 
Figure 31, we see that 


Figure 31 The effect of a 
linear transformation on the 
plane. The transformation 
will generally change the 
coordinates of points in the 
plane but has no effect on the 
coordinate system (shown in 
black). The origin is 
unaffected, and straight lines 
are transformed into straight 
lines. 
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Note how naturally we can move 
between the vector notation (., .) 
and the column vector 


notation i : 


Note that the quantity on the 
extreme right is a 2 x 1 column 
vector, not a 2 x 2 matrix. 
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a b 1.0 
aq[i aloes ts 
The multiplication rule given in equation (39) then tells us that the 
product of A and the unit column vector i is 


1.0 —O.9] J1} — |1.0 
0.3 1.5) 0) /0.3)’ 
and this exactly describes what happens to the unit vector (1,0) when it is 


affected by the linear transformation of Figure 31; it becomes the vector 
(1.0, 0.3). 


_ 
w| 


Similarly, according to the multiplication rule, the product of A and the 
unit column vector j is 


»_ |L0 =0.9) /0)_ |=0.9 
~ 10.3 1.5] }1} 1.5] ’ 
which exactly describes what happens to the unit vector (0,1) when it is 


affected by the linear transformation of Figure 31; it becomes the vector 
(—0.9, 1.5). 


In fact, the multiplication rule tells us that the product of A anda 
position column matrix x (representing a general point in the plane) is just 


1.0 —O.9] jx} — |1.072 —0.9y 
1.5] ly} : 


0.32 + 1.5y 
which exactly represents the general effect of the linear transformation on 
a position vector (x,y); it becomes the position vector 
(1.02 — 0.9y, 0.3a + 1.5y). 


a 


So, thanks to the multiplication rule for matrices, we can say that the 
effect of the particular linear transformation of the plane that was shown 
in Figure 31 is to transform any point represented by a 2 x 1 column 
vector x into the point represented by the 2 x 1 matrix product Ax, where 


1.0 —0.9 
as ke 
Though this is only one example, you should not be surprised by the 
following general rule. 


Any linear transformation of the plane can be represented by a 
2x 2 matrix A. The effect of the linear transformation on a position 
vector represented by the 2 x 1 column vector x is to transform it 
into the position vector represented by the 2 x 1 matrix product Ax. 


Of course, all this is deliberately ponderous for the sake of clarity. Those 
familiar with such transformations usually just say ‘A transforms x 

to Ax’. Indeed, mathematicians often prefer to describe the effect of the 
linear transformation in terms of the ‘mapping’ of vectors rather than their 
transformation, and will generally say ‘A maps x to Ax’. The column 
vector Ax is then described as the image of x under the mapping 


3. Matrices, vectors and linear transformations 


represented by A. The implication is the same, whichever form of language 
is used: 2 x 2 matrices can be used to represent linear transformations of 
the plane, and to work out their effects on position vectors. 


Example 11 


What is the effect of the linear transformation represented by A = fi Hl 


on a point with coordinates (2,1)? 
Solution 
First, we note that a point with coordinates (2,1) can be represented by 
the position column vector x = ° ; 
The effect of the transformation on x is then given by 
ee fi Hl | _ Keeeeee _ Hl 
1 4} }1 1x2+4x1 6] ° 


So the point (2,1) is transformed to the point (8,6). 


Exercise 22 


Consider the linear transformation represented by A = F Hl : 


Use the matrix A to find the effect of the transformation on each of the 
following position column vectors. 


We have now achieved our aim of showing how linear transformations of 
the plane can be represented using matrices. In the next subsection we 
consider matrix representations of some specific linear transformations 
with easily visualised actions. 


3.2 Some linear transformations of the plane 


In this section we examine some specific transformations of the plane. 
Identifying the 2 x 2 matrix 


a b 
associated with a particular transformation of the plane is most easily 


accomplished by examining the effect of the matrix on the unit column 
vectors. To see the reason for this, consider the matrix products 


w-[e JPJ-[. fe JEJ-[ 
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Ss 
— 
hoe 

Sy 


Figure 33. The rescaling of 
the plane by a factor of 2 in 
the positive y-direction 
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These show that the transformation represented by the matrix A is the 
one that maps i = (1,0) to (a,c), and also maps j = (0,1) to (b,d). This 
leads to the following general principle. 


The square matrix A describing a given linear transformation has 
columns that are identical to the column vectors produced by the 
action of the transformation on the unit column vectors. 


So if you know what the transformation does to the unit vectors, then you 
can write down its matrix. The use of this principle is shown in the 
following example. 


Example 12 


The matrix D = i 4 is said to represent a rescaling (or dilation) of 


the plane by a factor of 2 in the positive y-direction. 


(a) Work out the effect of this transformation on the point with position 
vector (1,2), and the point with position vector (2, 1). 


(b) Write down the effect of the transformation on the unit vectors (1,0) 
and (0,1), and justify your answer. 


(c) Sketch a diagram to show the general effect of the transformation on 
the plane and the unit square in Figure 30. Comment on the 
appropriateness of the description of its action. 


Solution 


(a) The matrix product 


1 1 O} {1 1 
Pll=[ al bl=[y 
shows that the position vector (1,2) is mapped to (1, 4). 


The matrix product 


Plil=[ al b= L 


shows that the position vector (2,1) is mapped to (2, 2). 


(b) (1,0) is mapped to (1,0), and (0,1) is mapped to (0,2). This is in 
agreement with the comments made above. The unit vectors (1,0) and 
(0,1) are represented in matrix notation by the unit column vectors. 
The result of multiplying these unit column vectors by D is to produce 
the columns of D itself. 


(c) Figure 33 shows the effect of the transformation. It stretches the plane 
by a factor of 2 in the positive y-direction, making the description of 
its action appropriate. 


3 Matrices, vectors and linear transformations 


The general rescaling of the plane allows the x-coordinate of every point to 
be multiplied by a real number x, while the y-coordinate is multiplied 

by A. Such a transformation is shown in Figure 34 and is represented by 
the following matrix. 


The two-dimensional dilation matrix 


DG N= k ‘| (40) 


Exercise 23 Figure 34 The general 


The matrix D(3,2) = F | represents a rescaling of the plane. a a as ae = 


(a) Write down the effect of the transformation on the unit vectors (1,0) y-enecaon 


and (0,1). 
(b) Work out the effect of this transformation on the point with 
coordinates x = 3 and y = —2. 


(c) What is the effect of the transformation on the area of a unit square 
(i.e. a square that may be edged by unit vectors)? 


Rotations about the origin constitute an important class of linear 
transformations, often encountered in science and engineering. The effect 
of a rotation about the origin by an angle a in the positive 

(i.e. anticlockwise) sense is shown in Figure 35. Such a transformation of 2, 
the plane can be represented by the following matrix. 


The two-dimensional rotation matrix \ os a, sin @) 
F a 
GosGy — Ciliary T T > 
= 0 40 
R(q) bee cos @ | : (41) 1 2 


Figure 35 The rotation of 
the plane represented by 
write down the matrix that R(a) 


Exercise 24 


(a) Recalling that sin} = cos = won 
represents a rotation about the origin by 7 radians in the 
anticlockwise sense. (Be explicit, i.e. replace trigonometric functions 


by arithmetical quantities.) 


(b) Write down the effect of this transformation on the unit vectors (1,0) 


and (0,1). 
(c) Work out the effect of this transformation on the point with 
coordinates « = —1 and y = —1. 


(d) What is the effect of the transformation on the area of a unit square? 
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We end with one of the simplest transformations of the plane: the 
identity transformation. This is the transformation that leaves everything 
where it was. It is represented by the following matrix. 


The two-dimensional identity matrix 


=f 9 : 


Exercise 25 


The identity matrix I can arise as a special case of the dilation matrix 
D(«, A) or the rotation matrix R(a). For what values of «, \ and a will 
this happen? 


3.3. Basic matrix algebra and successive 
transformations 


Now that several different matrices have been studied, we can start to 
develop some of the ideas of basic matrix algebra. For now we restrict the 
detailed discussion to two dimensions, so that the concrete setting of 
transformations of the plane can continue to be used. A more general 
treatment will be provided in Section 4. 


Note that throughout this subsection we will be making extensive use of 
subscripts to distinguish matrix elements; in particular, we will write 


Aa=|@ 92) oq BH bii bi2 
a21 da922 bat boo 


Scaling and adding matrices 


We start by defining the scaling and adding of matrices, both of which 
have very natural definitions. 


Given a matrix A of any order, the operation of scaling the matrix 
by a scalar \ produces a matrix AA, of the same order, in which each 
element is multiplied by A. Thus, in the case of a 2 x 2 matrix, 


ai1 42 Aa, Aaj 
AA =A = : 4 
ie | Res ou 


Given two matrices A and B of the same order, the operation of 
adding the two matrices produces a matrix A + B, of the same 
order, in which each element is the sum of the corresponding elements 
of A and B. 
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Thus, in the case of 2 x 2 matrices, 


a1 12 bi1 bie aji+bi1 ai2 + bie 
A+B= = . (44 
i ie es = ie | i + ba1 a2 + boo 7 


As you saw in Section 1, a position vector r may be written as a linear 
combination of unit vectors. We can now use the scaling and adding of 
matrices, together with the unit column vectors i and j, to provide a 
similar way of writing a position column vector x. 


Example 13 


Write the position column vector x = * as a linear combination of the 


unit column vectors i and j. 


Solution 

Ne 1 0) ‘ p 
> | 4p Hi +Yy H =71+ Yj. 
Exercise 26 


Evaluate the result of the following scalings and additions. 


waif] oa(]aL) [33 


Multiplying matrices 


You have already had plenty of practice at multiplying 2 x 2 square 
matrices and 2 x 1 column matrices. Now we extend matrix multiplication 
to the case of multiplying one 2 x 2 matrix by another 2 x 2 matrix. The 
result is always a 2 x 2 matrix, and the four elements in the product are 
worked out using a straightforward extension to the method used earlier. 
Formally, we can define the product of two 2 x 2 matrices as follows. 


The product of a 2 x 2 matrix A and a 2 x 2 matrix B is another 
2 x 2 matrix given by 


ABa=|@h 2 bit bye 
a21 a22\ |ba1 bee 


a ee + aj2b21  ay1b12 + | (45) 


a21b11 + dg2b21 d21b12 + a2gb22 
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D(x, A) R(a) 
xO > X] > X92 
Figure 37 The effect of 


successive transformations 
on Xo 


Note that when multiplying 
factors such as cosa and 271, we 
prefer to write the result as 

21 cosa since the alternative 
cos ax, could be misinterpreted 
as cos(Qz'1). 
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It is not recommended that you try to remember this equation, but you 
should try to remember the process by which the four elements in the 
result are determined from the elements of A and B. This may be 
described by saying that the element in the ith row and the jth column of 
the product is the sum of the element-by-element products of the ith row 
of the first matrix and the jth column of the second matrix. In Figure 36 
this is illustrated by the hand symbols for the case of the element 

411612 + a42b22 in the first row and the second column of the product. All 
the other elements may be worked out in a similar way. 


P p 


cee (211 12] fbr 12] _ fai1b11 + @12b21 a11bi2 + a12b22|<g, 
a2, a22} |bo1 boo ag1b11 + ag2b21 agi bi2 + aagbo9 


Figure 36 Obtaining the element in the first row and the second column 
of the product of two matrices 


Exercise 27 


Work out the following matrix products using the method of Figure 36. 


Ohi] ob aba} eh alp 4 


Successive transformations 


Now consider what happens when we carry out one transformation of the 
plane and then perform another transformation on the result — e.g. a 
rescaling followed by a rotation. Let us represent the first transformation 
by the dilation matrix D(x, ) given in equation (40), and the second by 
the rotation matrix R(q) given in equation (41). For the sake of 
definiteness, consider their action on a specific point initially located at the 


position represented by the column vector xp = . This is indicated 
0 
schematically in Figure 37. 


Suppose that the first transformation, the rescaling, transforms xo into the 
position column vector x; = D(k, A) xo, so that 


vy K O} |29 KX0 
= = . 46 
Fi ° Q a Sn a 
Also suppose that the second transformation, the rotation, then transforms 
x, into the position column vector x2 = R(a) x1, so that 


z2| _ |cosa —sina] |x| _ |x; cosa— yj sina 

yo|  |sina cosa | /y|  |aysina+y,cosa| ’ 
But from equation (46) we have x1 = Kao and y; = Ayo, so we can rewrite 
this result as 


x2 = R(a) D(k, A) xo = bs cos a — Ayo sin ; 


Kx sina + Ayo cosa” 


3 Matrices, vectors and linear transformations 


Now, this is significant because it has the same general form as the result 
of applying a single 2 x 2 matrix, C say, to xo. The effect of C is indicated 
schematically in Figure 38. If our definition of matrix multiplication makes 
sense, we should expect the matrix C to be the product R(a) D(x, A). Let 
us check to see if this is correct. First, let 


COS @ pa k ‘ = bees oe 


C = R(a) D(x, A) = ee cosa | }0O A Ksina Acosa 


Then note that 

Kcosa@ —Asina| |x Kx cosa — Ayo sina 

Cxo _ : = : = K2 
Ksina Acosa | | yo kx Sina + Ayo Cos a 


This confirms our expectations. The conclusion is clear and can be stated 
in general terms as follows. 


The effect on a column vector x of a first transformation, represented 
by A, followed by a second transformation, represented by B, is 
equivalent to the effect of a single transformation, represented by the 
matrix product BA. 


Pay close attention to the order of the matrices in the above result. The 
first matrix to act, A, is the one that appears on the right in the product 
BA. This may look a little odd if you are not familiar with matrices, but 
it has to be so, since we are constructing a product that will act on any 
column vector that appears even further to the right in the combination 
BAx. 


Exercise 28 


Find a matrix C that can act on a column vector x to reproduce the effect 
of a rescaling represented by D(2,1) followed by a rotation R(§). 


In the example above, and in all the work that led up to it, we have been 
careful to preserve the order of matrices in a matrix product. This is 
important because matrix multiplication is not generally commutative: 
AB may be very different from the product BA. 


Exercise 29 


Find the matrix F that represents the product of the transformations in 
Exercise 28 but in the reverse order, i.e. F = D(2, 1) R(%). Show that, in 


general, F is not the same matrix as C, i.e. D(2,1) R(F) 4 R(F) D(2, 1). 


It is not really surprising that the result of a rotation R() followed by a 
rescaling D(2,1) differs from the result of performing those operations in 
the reverse order. It is symptomatic of the following general rule. 


D(k, A) 


XO 


> X] 


Figure 38 The effect of the 
combined transformation C 


on Xo 
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Matrix multiplication is non-commutative 


Though there are cases where matrices do commute, generally 


ABBA. 


Despite being non-commutative, matrix multiplication is associative, so 
(AB)C = A(BC). (47) 

It is also the case that matrix multiplication is distributive over addition, so 
A(B+C)=AB+AC, (48) 

and matrix multiplication is linear with respect to scalar multiplication, so 
A(AB) = A(AB), where A is a scalar. (49) 


As usual, we can use these properties to make sense of matrix expressions 
that involve brackets. 


Matrices are ideally suited to describing situations where the order of 
operations really matters, as in much of science and also in everyday life. 
Do not think of a matrix inequality such as AB 4 BA as something 
frightful. It might just signify that the operations of putting sugar into tea 
cups and drinking tea from those cups do not commute: the order in which 
you do things clearly matters in this situation, and many others. 


Despite the generally non-commutative nature of matrix multiplication, 
there are cases where the matrices do commute, so their order may be 
reversed without changing the result. One such case is when one of the 
transformations is the identity transformation represented by the identity 
matrix I (first introduced in equation (42)). Commutation in this case 
makes good sense, since the identity transformation doesn’t change 
anything, so you would expect it to have the same lack of effect whether it 
was done first or last. As you can easily confirm for any 2 x 2 matrix A, 


IA = AI=A. (50) 


Exercise 30 


Show that products of rotations in two dimensions do commute by 
establishing that 


R(a) R(3) = R(6) R(a) = R(a + 8). 


3.4 Undoing transformations and matrix inversion 


Often, after performing an operation such as a linear transformation of the 
plane, we need to reverse the changes that have been made and undo the 
transformation. In matrix algebra this is achieved by following the action 
of a matrix, A say, by the action of the inverse matrix, which is 


3. Matrices, vectors and linear transformations 


denoted A~!. The combination of the two, represented by their matrix 


product, results in no change, so it must be equal to the identity matrix I. 


The same must be true if we perform the inverse transformation first, and 
then follow that with the original transformation. Interestingly, inverse 
transformations do not always exist. However, when they do exist, they 
are unique, so what we can say in general is the following. 


Given a square matrix A, its inverse matrix, if it exists, is 
denoted A~!, has the same order as A, and satisfies the condition 


(AAe = Aa A= (51) 


Also, if A and B are square matrices of the same order, and 
AB=BA=l1, than B=A +. 


The process of finding the inverse of a matrix is called matrix inversion. 


In some cases it is easy, even obvious. In other cases it can be difficult or 


simply impossible. A matrix that can be inverted is said to be invertible. 


In this subsection we examine some simple cases, describe a general 
procedure for finding the inverse of a 2 x 2 matrix, and determine the 
criterion for deciding whether a given 2 x 2 matrix is invertible. The 
inversion of larger matrices will be discussed in Section 4. 


Simple matrix inversions 


The inverse of the identity matrix is the identity matrix itself, i.e. 
I-! =I. This is clear, since II = I. 


The inverse of the dilation matrix D(«, \) is D~!(«K, A) = D(1/«, 1/A). 
The inverse of the rotation matrix R(a) is R7!(a) = R(—a). 


All of the above inverses are guaranteed to exist, apart from the inverse 
dilation matrix in the special case that « = 0 or A = 0. 


Example 14 
1/m 0 


0 ip is the 


Confirm by matrix multiplication that the matrix 
. Saisie . |K O 
inverse of the dilation matrix k 1 , for non-zero « and X. 


Solution 


It is sufficient to note that 


ell I Ite A-6 9 
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Exercise 31 


Confirm by explicit matrix multiplication that the inverse of the rotation 
matrix R(a) is R(—a). 


Inversion of a general 2 x 2 matrix 
Consider the matrix product 
2 3 4 -—3 5 0 
apa [tli al=(o 3). 
It is obvious that A is not the inverse of B, but it is also clear that +A 
will be the inverse of B. This technique, of first identifying a matrix with 


the right structure and than scaling it by an appropriate factor, is the 
basis of the following general rule for finding the inverse of a 2 x 2 matrix. 


Inverse of a 2 x 2 matrix 


Given the 2 x 2 matrix A = ° , , its inverse matrix, if it exists, is 


d 
given by 
1 d —b 
le 
A =a | ae (52) 
Example 15 


Using the general expressions given above for a 2 x 2 matrix A and its 
inverse A~!, verify by matrix multiplication that A7~'A =I. 


Solution 


1 d —b| ja b| | 1 da — bc db—bd|_ }|1 O 
ad — bc |—c alle d| ad—bc|—catac —cb+ad|~ |0 1)’ 


Exercise 32 


Using equation (52) again, verify that AA7! =I. 


Exercise 33 
1 1 1 0 1 -l 
Let A= [F ies) | and C= |i il 


(a) Find the inverses of A, B, C and D = ABC. 


(b) Verify that D~' = C-!'B-!A7!. (In other words, verify that 
(ABC) =C BA) 


3 Matrices, vectors and linear transformations 


Note the interesting pattern revealed by this exercise: 


(ABC)"'=C7'B“'A7t. 


As you will see in Section 4, this is actually a general result: the inverse of 
a product of matrices is equal to the product of the inverses in reverse 
order. 


Criterion for the existence of an inverse 


In equation (52), the general formula for inverting a 2 x 2 matrix A, it is 
necessary to divide every term in a matrix by the quantity ad — bc. This is 
mathematically meaningful only if ad — bc is not equal to zero. That is 
why it is impossible to invert some 2 x 2 matrices. As you will see later, 
the quantity ad — bc for the matrix A is a particularly simple example of a 
mathematical entity called a determinant. Every square matrix A has a 
determinant, usually denoted det A, but only in the case of 2 x 2 matrices 
can it be generally expressed as simply as ad — bc. Using the determinant 
we can say the following. 


Criterion for the existence of A~! 


Given the 2 x 2 matrix A = E i , its inverse matrix AW! exists if 
and only if 
det A = ad — bc £0. (53) 


From the matrix inversion formula equation (52), it is clear why ad — bc is 
important, but in the case of 2 x 2 matrices it is possible to get a deeper 
and mathematically more interesting insight into the origin of the 
existence criterion. Viewed as a transformation of the plane, the effect of 
the matrix A is to map the unit vectors (1,0) and (0,1) into the vectors 
(a,c) and (b,d), as indicated in Figure 39. 


i] YA 


f (b, d) 


= 


(a,c) 
0 1 L 0 z 


Figure 39 The geometric effect of a general linear transformation of the 
plane 


As a result, a unit square of area 1 is mapped into a parallelogram of area 
ad — bc. (You established that the area of a parallelogram is |ad — bc| in 
Exercise 19.) 
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Consequently, if the matrix A that describes the mapping has det A = 0, 
so ad — bc = 0, then the area of that parallelogram must also be zero. This 
means that the corners of the parallelogram must be either on the same 
line (i.e. collinear), or all at the same point. This is indicated in Figure 40. 
In this extreme case, the action of A has been to collapse the unit square 
to such an extent that there is insufficient information in the resulting 
‘parallelogram’ (actually either a line or a point) to allow the original unit 
square to be reconstructed by the inverse transformation A~!. That is 
why the inverse transformation cannot exist. 


¥y YA 


1 


=> | eae 


Qv 


0 | @ 0 


Figure 40 The geometric effect of a linear transformation with zero 
determinant; the parallelogram is reduced to a line or a point 


Exercise 34 


Which of the following matrices has no inverse? 

1 -2 —2 4 4 1 —2 1 

2 4)’ |-1 2)’ [2 —2)’ 4 2\° 
Exercise 35 


Referring to the three matrices A, B and C introduced in Exercise 33, 
calculate the determinants of A, B, C and ABC. Verify that 
det(ABC) = det A det B det C. 


Exercise 36 
The determinant of 
1 2 
a=[ 
vanishes. What is the effect of this transformation on the Cartesian unit 


vectors? Does A indeed transform the unit square to a geometric object 
with no area? 


4 Matrix algebra 


4 Matrix algebra 


Section 3 was concerned with examples of matrix action that were easy to 
visualise, so the discussion was mostly confined to 2 x 2 matrices that 
could be interpreted geometrically. Such transformations are important, 
but they are only one indication of the great wealth of applications of 
matrix algebra in general. There are many applications of matrices that do 
not involve geometry, and many that involve matrices larger than 2 x 2. 


An example from biology 


As a brief example, consider a biologist studying the effects of 
nutrition on a group of animals. Suppose that two nutrients n; and 
ng are being fed to the animals in two different foodstuffs f; and fo. 
Also suppose that the mass of nutrient n; per kilogram of foodstuff f; 
is ajj. The situation is depicted schematically in Figure 41. 


A feeding scheme that supplies each animal with a mass M; of Figure 41 Foodstuff fy 
foodstuff f; will automatically supply each animal with a mass m; of et cen a yawn ey of d 
nutrient n; given by m; = aj;1M, + aj2M>. You should recognise this nutrient m1 (red circle), an 


as the ith element in the matrix dai of mune ient na (blue 
square); similarly for 


fae = ie ss] Al (54) foodstuff fo 


my az} a22| | Mo 


It would take quite a lot of effort to use matrix mathematics for such 
a simple problem. Suppose, however, that the biologist is actually 
interested in 20 nutrients being delivered in different proportions by 
20 foodstuffs. The delivery of nutrients by a given combination of 
foodstuffs would then be described by the product of a 20 x 20 
matrix A and a 20 x 1 feeding matrix M. Matrix multiplication 
would provide a useful tool for keeping track of such a complicated 
arrangement, which would be described by a system of 20 equations. 
In addition, the problem of finding the combination of foodstuffs that 
ensures the delivery of the required amounts of each nutrient might 
be tackled by determining the inverse of the 20 x 20 matrix. 


Similar (less contrived) problems arise every day in a range of activities. 
As a result, the study of the matrix algebra of general m x n matrices is 
an important part of almost every course in higher mathematics, pure or 
applied. In addition, the algebraic principles that it teaches extend beyond 
the domain of matrices, rich as that is. In particular, matrix algebra is 
now recognised as the principal example of a broader subject known as 
linear algebra, which will be introduced in Unit 5. 


Many aspects of 2 x 2 matrix algebra were covered in Section 3. In this 
section we generalise to the case of m x n matrices. 
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Equal? v 


‘e 


(5 x 2)(2 x 3) 


x x 
(5 x 3) 
Figure 42 A visual reminder 
of the rule for the existence 

and order of a matrix 
product. Here A has order 


5 x 2, B has order 2 x 3, and 
AB has order 5 X 3. 


9x3 
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4.1 Notation and fundamentals of matrix algebra 


As usual, we represent a matrix of order m x n (i.e. m rows and 

n columns) by a bold symbol, such as A, with a;; representing the element 
in its ith row and jth column. In an extension of our earlier notation we 
henceforth use the notation [a;;] to indicate the whole matrix of 

elements a;;. So writing A = [a;;] means exactly the same as equation (38). 
Similarly, we write B = [b;;] for a matrix B of elements b;;, etc. 


Concepts such as equating matrices, adding matrices (of the same order) 
and scaling matrices are all easily generalised to the case of m x n 
matrices. We will state the rules describing them later, but there will be 
no surprises. Slightly more challenging is the generalisation of matrix 
multiplication, so we deal with that first. 


The first thing to remember about the general case of matrix 
multiplication is that it is possible only when the matrices involved are of 
the appropriate orders. As indicated by the hand symbols in Figures 32 
and 36, matrix multiplication involves summing the products of 
corresponding elements from a row of the first matrix and a column of the 
second matrix. This is possible only if the number of columns in the first 
matrix is equal to the number of rows in the second. This is embodied in 
the following rule concerning the order of matrix products. 


Rule for the existence and order of a matrix product 


An m X q matrix A may be multiplied by an r x n matrix B to form 
the matrix product AB if and only if g =r, in which case the product 
AB will be of order m x n. 


Because of this rule, when considering multiplying A by B, you may find it 
helpful to picture something like Figure 42, with the order of A written to 
the left of the order of B. This will produce a row of four integers. If the 
two inner integers are equal, the matrices can be multiplied together. In 
such cases, the order of the matrix product AB is given by the two outer 
integers. 


Supposing that the matrices A and B are suitable for multiplication, how 
should the result be determined? The answer is based on a straightforward 
generalisation of the method used for multiplying 2 x 2 matrices in 
Section 3. It is embodied in the following rule. 


Rule for the evaluation of a matrix product 


The element in the 7th row and jth column of the matrix product AB 
is the sum of the element-by-element products of the ith row of A 
with the jth column of B. 


This way of combining rows and columns should already be familiar from 


earlier examples, but the whole idea is illustrated again, schematically, in 
Figure 43. 
P P 


<— row 1 —~> 


ce}|<— rowi}—>] |coll - |col gj] + col N}] = | 4+ [eg] + 


Figure 43 A visual prompt to help you to remember the rule for the 
evaluation of a matrix product 


The consequence of the rule can be represented formally (but less 
helpfully) by saying that when an m x r matrix A multiplies an r x n 
matrix B to form the m x n matrix product AB = C, the element c;; in 
the ith row and jth column of C is given by 


r 


Cj = 2S aindk; — aj1b1; + ajgbo; Sfp Sane Sr ir by;. (55) 
k=1 
Example 16 


Consider the matrix product 


2-1 3)/ 3 7 
—2 1 2 2 2 
—2 -2 
Write down the order of the resulting matrix, and evaluate the product. 
Solution 


The product of a 2 x 3 matrix and a 3 x 2 matrix is a 2 x 2 matrix: 


S =i 

g 1) 3 
2 1 3| E 
7 | 2(3) — 1(2) + 3(-2) | 2(—1) — 1(2) + 3(-2) 
~ |—2(3) + 1(2) + 2(—2) —2(-—1) + 1(2) + 2(-2) 
aie 2-6 -2-2 " 
lO a ad 
lc “| 

—8 0|* 


4 Matrix algebra 


Don’t try to remember this 
formula. Remember instead the 
method shown in Figure 43 that 
underlies it. 
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Exercise 37 


Evaluate the following matrix products, where they exist. 


@® fab a) © e alos] © bye -4 


@ [os a}) 


As you saw in Section 3, matrix multiplication is associative 

(so (AB)C = A(BC)) and distributive over matrix addition 

(so A(B + C) = AB+ AC). Matrix multiplication is also linear with 
regard to scalar multiplication (so AAB = AAB). However, matrix 
multiplication is not generally commutative. This last point is so 
important that it deserves its own box, even though you have seen it 
before. 


In general, 
ABBA, 


although there are cases where matrices do commute 


Exercise 38 
Contrast this with rotations in In three-dimensional space, rotations do not, in general, commute. For 
two dimensions, which do example, consider the matrices 
commute, as we showed in 
Exercise 30. ie ome 1 0 0 
A=]}1 0 0 and B=/]0 0 -1l 
0 O11 01 O 


The first matrix transforms space by rotation of 5 about the z-axis; the 
second matrix transforms space by rotation of > about the x-axis. 


Let C be the transformation of performing A and then B; let D be the 
transformation of performing B and then A. Calculate the matrices C 
and D, and hence decide if A and B commute. 


Transposing a matrix 


In another extension to our earlier notation, we introduce the operation of 
taking the transpose, which interchanges the rows and columns of a matrix, 
so that the first row of a matrix becomes the first column of the 
transposed matrix, and so on. This operation is indicated by a 

superscript 7’, so we can write 
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if 
a bl’ fae ’. & 2 
co dl ~lo al and —6 1] = 
0 4 


This useful operation has many applications, including allowing us to save 
space by writing potentially long column matrices as transposed row 


matrices, asinL=[1 0 0 0 0 Oj’. 


Using our compact notation for matrices, with A = [a;;], we can define the 


transpose as follows. 


Given any matrix A, its transpose A? is defined by [a,j]? = [a;i]. 


A” is read as ‘A transpose’. 


Example 17 


Show that in the case that A is a 2 x 2 matrix and x is a 2 x 1 matrix, it 


will be always be the case that (Ax)? = x7 A’. 
Solution 


Let A = k 4 and x = ?). Then 
c d y 
_ ja 6b] |x| _ jax + by 
cis ° , A — pened j 


(Ax)" = [ax +by cx +dy]. 
But 


SO 


xPAT=[z yl i (] = feat wb ac + yd] . 


So it is true that (Ax)? = x7 AY in this case. 


In fact, this example illustrates the following general result concerning the 


transposition of matrix products. 


If the matrix product AB exists, then its transpose (AB) is given by 


(AB)? = BA’. 


General rules of matrix algebra 


We are now in a position to summarise the general rules of matrix algebra, 
including those for scaling and adding that were promised earlier. In 
summarising the rules, we again use the compact notation in which 


A = [a,j] and B = [};;]. 


4 Matrix algebra 


? 


Notice the reversed sequence of 
the matrices: the transpose of a 
product is the product of the 
transposes in reverse order. 


53 


Unit 4 Vectors and matrices 


54 


Basic rules of matrix algebra 


Two matrices A = [a,j] and B = [};;] are equal if they have the 
same order m xX n, and 


jj -— 0, stor alli— 2 et an — ed eet 


If A = [a;;] and B = [b;;] are any two matrices of the same order, 
then their matrix sum is defined by 


A+B = [aij +6;;| (ie. element-wise addition). 


If A = [a;;] is any matrix, then there exists a zero matrix 0, of 
the same order, such that 


A+0=A (ie. the elements are unaltered by adding 0). 


If A = [a;;] is any matrix and k is any scalar, then the scalar 
multiplication of A by k is defined by 


kA = |kajj| (i.e. element-wise multiplication by k). 


If A = [a;;] is a matrix of order m x r and B = [bj;] is a matrix of 
order r x n, then the matrix product C = AB exists and is a 
matrix of order m x n, where the element cj is defined by 


Cig = G41b15 + Ginbag + +++ + Girbdrj. 


If A = [a;;] is any matrix of order m x n, then there exists an 
identity matrix I of order m x m such that 


Ne 


and there also exists an identity matrix of order n x n, also 
denoted I, such that 


AI=A. 


If A = [a;;] is a matrix of order m x n, then its transpose AY is 
a matrix of order n x m defined by 


[ai;|’ = [aj] (ie. the interchange of rows and columns). 


If the matrix product AB exists, then the transposed product 
(AB) also exists, and is given by 


(AB)? =B’ A” (note the reversed order). 


Additionally, note the following. 


The subtraction of a matrix should be interpreted as the 
addition of a matrix that has been multiplied by the scalar —1. 
The power of a matrix should be interpreted as repeated 
multiplication, as in A? = AA, A? = AAA, etc. 

Expressions involving brackets should be interpreted in the usual 
way, though the ordering of products of matrices should not be 
changed since matrix multiplication is generally 
non-commutative. 


Exercise 39 
1 2 2 3) 1 0 
Lett A= ]3 4|,B=J]-1 —-4 and C= |) ‘h 
5 6 3 1 


(a) Write down A7, B” and C’. 
(b) Verify that (A +B)? = A? +B?. 
(c) Verify that (AC)? = CTA”. 


4.2 Determinants, inverses and matrix algebra 


Determinants 


The term determinant was introduced into mathematics in 1801 by Carl 
Friedrich Gauss (1777-1855), though the idea can be traced back to the 
eighteenth century and beyond. (You will learn more about Gauss in 

Unit 5, when we discuss a technique known as Gaussian elimination.) 

A ‘determinant’ was originally associated with a general square array of 
numbers. However, following the introduction of matrices in 1850, by the 
British mathematician James Sylvester (1814-1897), and the development 
of matrix theory by Sylvester’s friend and colleague Arthur Cayley 
(1821-1895), the meaning of ‘determinant’ became strongly associated 
with square matrices, and we now generally speak of the ‘determinant of a 
(square) matrix’. This is the spirit in which we introduced determinants in 
Section 3, where we said that the determinant of the 2 x 2 square matrix 


° ‘ was the quantity ad — bc. More generally, according to this modern 


view, we have the following. 


The determinant of any square matrix A is a single value, denoted 
det A, that may be calculated from the elements of A by following a 
standard prescription. 


The ‘standard prescription’ for evaluating determinants can be expressed 
in various ways. The form that we use is called Laplace’s rule, and will be 
stated later. First, however, let us look at some examples of determinants, 
to see what Laplace’s rule must achieve. 


When using the elements of a square matrix to calculate a determinant, it 
is conventional to write those elements between vertical lines (a practice 


introduced by Cayley in 1853). Thus if A = b 4 , we can write 


d 


a b 
det A = a: aA 


| = ad — be. (57) 
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This convention allows us to speak of the ‘rows’ and ‘columns’ of a 
determinant, and to describe a determinant with n rows and n columns as 
an n x n determinant, even though, when finally evaluated, the 
determinant is only a single value. 


Determinants may be associated with square matrices of any order. 
However, in practice a very important case is that of a 3 x 3 matrix. As 
you will see later, Laplace’s rule tells us that the determinant of such a 
matrix may be written as follows. 


Determinant of a 3 x 3 matrix 
a1 ag a3 
bi bg 63) = ay 
Cl Gy @ 


bo 63) _ 


é (58) 


C1 


by 3 
C3 


At present we ask you to simply accept this formula. Shortly, you will 
learn a simple rule that enables you to construct the determinant of any 
nm Xn matrix. 


Having expressed the 3 x 3 determinant as a linear combination of 2 x 2 
determinants, we can use equation (57) to work out the 2 x 2 determinants 
and thus complete the evaluation. You can see this in the following 
example. 


Example 18 
22 -1 

IfA=]3 5 1], evaluate det A. 
1 2 1 

Solution 


Using equation (58), 


Bo 3 1) [3 5 
det A = 21, i213 i|- | ; 
= 2x (5—2)-—2~x (3-1) -— (6-5) 
a 
= 
Exercise 40 


Evaluate the determinants of the following. 


2 -l1 2 0) 2 —-1 
(a) A=|5 1 -1| (b)B=]3 0 -1 
2 1 -l 2 -l 0 


Examining these examples of 3 x 3 determinant expansions will help you 
to make sense of the expansion of determinants in general. The expression 
on the right-hand side of equation (58) is the sum of three terms. Each of 
those terms involves a 2 x 2 determinant. The origin of those 2 x 2 
determinants is shown in Figure 44. 


Gm a a a ay a a a 
Element dil 12 LS ala 12 pes} 12 3) 
ne a21 422 423 a21 G22 423 a22 493 
- a31 432 433 431 432 433 a32 433 
Minor a22 a3 a2, 493 a22 
M My, = Mi = M43 
lj 232 433 31 433 a31 432 
Cofactor 
Cy =4+Miy Clip = —Mie2 Ci3 = +Mi3 


C1; = (-1)'Y My; 


Figure 44 Expanding a determinant using the first row of elements and 
their cofactors 


As indicated, the first of the 2 x 2 determinants is obtained from the 
original 3 x 3 determinant by deleting all the elements in the same row and 
column as the element a,; and forming the determinant of what remains. 
This 2 x 2 determinant is called the minor of the element aj, and is 
denoted Mj;. In a similar way, the other two 2 x 2 determinants in 

Figure 44 are the minors Mjy2 and M43 of a2 and aj3, respectively. We can 
define the minor M;; of any element a;; of a determinant in a similar way, 
by deleting row 7 and column 7, and forming the determinant of what 
remains. 


To obtain the entire expression on the right of equation (58), each of the 
relevant minors M;; must first be multiplied by the factor (—1)'*? to 
produce C,; = (—1)? ae which is known as the cofactor of a;;. (If i+ 7 
is even, then Cj; = Mj;, but if i+ 7 is odd, then Cj; = —Mj;.) Finally, we 
must multiply each of the relevant cofactors by the corresponding 

element a,j, and add the resulting products. In the case of equation (58) 
this gives us a,1Cj, + ay2Cj2 + 413013, which reproduces the right-hand 
side of the equation, including the alternating signs. The whole process is 
described as ‘the expansion of the 3 x 3 determinant by the cofactors of its 
first row of elements’. 


Laplace’s rule for expanding (and hence evaluating) determinants is a 
simple generalisation of the procedure that has just been described, which 
you have already seen in Example 18 and carried out in Exercise 40. The 
main difference is that the rule tells us how to perform the expansion 
based on any row or column, and applies to a determinant of any n x n 
matrix. The rule may be stated as follows. 
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Although we are free to use any 
row or column when applying 
Laplace’s rule, it is usual to use 
the first row. 
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Laplace’s rule for expanding determinants 


Given an n x n matrix A = [a,j], its determinant det A may be 
expanded in terms of the elements in row 7 and their cofactors Cj; as 


det A = aj1Ci1 + ai2Cig + +++ + GinCin, (59) 
or equivalently, in terms of the elements in column 7 and their 
cofactors Ci;, as 

CBI == On Oe ae OO a= o> ae iy Ores, (60) 


miletes@,.— (-1)9 M;;, and M;; is the minor obtained by deleting 
row 2 and column j of the original determinant and forming the 
determinant of what remains. 


For obvious reasons, Laplace’s rule is sometimes called the cofactor 
expansion rule. 


Example 19 
10 O 3 
2 2 —-1 2 
Evaluate det A = 35 1 —1 
1 2 1 -1l 


Solution 
To simplify the evaluation, we should expand using the first row, since that 
contains two zero elements, so it will minimise the work required. Thus 
det A = ay1Cy1 + a12C 12 + a13C 13 + a14C 14 
= Cy, + 3Cj4. 


Now Cy, = (—1)!*!My, = Mi, where Mj, is the determinant of the 
matrix obtained by crossing out the first row and first column of A, i.e. 


2 -1 2 
Cy = My, = ]5 1 —-l}. 
2 1 -1 


This determinant was evaluated in Exercise 40, where we obtained the 
result 


Cy = 3. 


Further, Cy, = (—1)!*4My, = —Mj, where Mj, is the determinant of the 
matrix obtained by crossing out the first row and fourth column of A, i.e. 


2 aa 
CiS==—Ma=sl8 & tS -1. 
ts 4 


where we have used the result obtained in Example 18. Hence 


dettA =3-—3=0. 


The evaluation of determinants can be very time consuming, so here are 


some rules that exploit the intrinsic symmetry of determinants to speed up 


the process. 


Rules for the determinant of an n xX nm matrix A 


e Interchanging any two rows or any two columns of A changes the 
sign of det A. 


e §=det(A”) = det A. 

e Multiplying any row or any column of A by a scalar & multiplies 
det A by k. 

e For any scalar k, det(kA) = k” det A. 

e Adding a multiple of one row of A to another row does not 


change det A. Nor does the corresponding operation for a pair of 
columns. 


e—:If any row or column of A consists entirely of zeros, then 
det A = 0. 


Exercise 41 


Use the rules given above to show that the following determinant is zero: 


1 3 -2 
—-3 2 6). 
—2 4 #4 


(Hint: Try to make a row or column vanish.) 


The application of determinants to vector products 


The expression for a 3 x 3 determinant in equation (58) provides a useful 
mnemonic for the definition of a vector product, given in Subsection 2.2. 


In fact, the vector product of two vectors can be obtained as follows. 


Vector product as a determinant 


If b = byi + byj + 6zk and c = czi + cyj + ck are two vectors, then 
their vector product can be obtained by evaluating the determinant 


8 ik 
by bz}, [bz bz}. , |bx 6 

bxc=— |b, 0, DA Pe ee alk (61) 
On ey oe Yy we © Zz ‘L Yy 


This reproduces the vector product rule that was given in equation (32) 
(though there we called the vectors a and b). 
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A multiple can be negative or 
positive, so the last rule covers 
subtracting a multiple of one 
row from another row. 
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In a similar way, the component expression for the scalar triple product of 
three vectors can be obtained as follows. 


Scalar triple product as a determinant 
Gy Ay Gy 
aD XC) = 07. 0,00, 
Cy Cy Cz 
by be 
Cy Cz 


(ely 


Cp © 


by dy 


SP Ole : 
Cy Cy 


= iy — dy : (62) 
z 


Using these determinant-based expressions, it is easy to find similar 
formulas for the areas of parallelograms and the volumes of 
parallelepipeds. Many other geometric results can also be expressed in 
terms of determinants. 


Exercise 42 
Use determinant-based methods to do the following. 
(a) Evaluate r x s, where r = (2,5,0) and s = (2, —1, 2). 


(b) Find the area of the parallelogram with sides defined by the position 
vectors rj = (1,1,0) and re = (2,2, —2). 


(c) Find the volume of the parallelepiped with sides defined by 
a = (2,5,0), b = (1,—2,0) and ¢ = (1,—3, 2). 


Finding inverse matrices 


Determinants are very common in the physical sciences. They are often 
used as a neat way of summarising important results, as you will see 
shortly. However, for our immediate purposes, in the context of matrix 
algebra, determinants are important for the part they play in relation to 
inverse matrices. Here is a procedure for finding the inverse of any square 
invertible matrix, based on a generalisation of the method for 2 x 2 
matrices in Section 3. (It is worth noting that for large matrices there are 
other methods that are computationally more efficient, and more likely to 
be used in practice.) 


Procedure 1 Finding the inverse of an n x n matrix 


Suppose that we are given the n x n matrix A = [a;)]. 


1. Evaluate det A, and confirm that det A 4 0. (If det A = 0, the 
matrix A is non-invertible; no inverse exists, and you should 
abandon the attempt to find it.) 


2. Evaluate the cofactor C;; of each element a;;, using the relation 
Cj; = (—1)'*) Mj;, where Mj; is the minor obtained by deleting 
row 7 and column j of the original determinant and forming the 
determinant of what remains. 


3. Form the n x n square matrix C = [Ci;], where the element of C 
in row 7 and column j is the cofactor Cj;. 


4. Take the transpose of C to obtain the matrix C7’. 
5. Scale the matrix C’ by 1/det A to obtain the inverse of A: 


il 
Ane ce 
det A (8, 


Note that given a square matrix A, the matrix C7 is sometimes called the 
adjugate matrix of A, represented by adj A. For that reason you will 
sometimes see the inverse of A written (elsewhere) as A~! = adj A/det A. 


Example 20 


Use Procedure 1 to derive the inverse of a 2 x 2 matrix, given in 
equation (52). 

Solution 

b 
d 


My=d, My=c, Mg,=6, M2 =a. 


For a general 2 x 2 matrix A = i | , the minors are 


Hence the matrix of the cofactors is 


d —c 
a 
and its transpose is 
cre d —b 
—¢ al 


The determinant of A is det A = ad — bc, so the inverse of A is 
at 1 or 1 | d | 


Cc a 


~ detA ~  ad—be 


in agreement with equation (52). 


Example 21 
In Example 18 we showed that the matrix 
2 2 -1 
A= |3 5 1 
1 2 1 


has det A = 1, so it is an invertible matrix. Find A7!. 
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Solution 
The cofactors of A are 
5 1 3 1 3.5 
Cu=+]o5 i|=3 Ca=—} i| = -2, Cig = +], 9) = 1 
2 -l 2 -l1 2 2 
Cry i|- 4, Ca2= +1) i|=3 CaS) 9| = ~2: 
2 -l 2 -1 2 2 
C31 = +], lice C32 = —|2 |= 5, C33 = +. 5| = + 
Thus the matrix of cofactors is 
3-2 1 
C = [C,;] = |—4 3-2 
7 —5 4 


Since we already know that in this case det A = 1, it follows from 
Procedure 1 that 


3-4 7 
Al=Ch=/-2 3 -5 
1-2 4 


Though not required, it is always good practice to confirm the result 
AA7! =I by explicit matrix multiplication. In this case we get 


a! | a an 1 0 0 
AA+=]3 5 1] ]-2 3 —-5|= 
1S 4 bt oe. 


Exercise 43 


Find the inverse of the matrix A of Exercise 40(a), where we showed that 
det A = 3. 


We should also note the following rules relating to matrices, inverses and 
determinants. 


Rules for n x n matrices, inverses and determinants 


e The matrix A can be inverted if and only if det A 4 0, in which 
case det(A~!) = 1/det A. 


e For two n xX n matrices A and B, we have 
det(AB) = det A det B. 


e If det(AB) 40, so (AB)! exists, then (AB)-!'=B7!A7!. 


Note the reversal of order in the last of these rules. This is very similar to 
the reversal that we saw when expressing the transpose of a product in 
terms of the product of the transposes. 
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Learning outcomes 


After studying this unit, you should be able to do the following. 


Understand the meaning of the terms scalar, vector, displacement 
vector, unit vector and position vector, and know what it means to say 
that two vectors are equal. 


Use vector notation and represent vectors as arrows on diagrams. 


Scale a vector by a scalar, and add two vectors geometrically using the 
triangle rule. 


Resolve a vector into its Cartesian components, and scale and add 
vectors given in Cartesian component form. 


Write down the vector equation of a straight line through two given 
points. 


Calculate the scalar product and vector product of two given vectors. 


Determine whether or not two given vectors are perpendicular or 
parallel to one another. 


Determine the magnitude of a vector and the angle between the 
directions of two vectors. 


Resolve a vector in a given direction. 


Use the vector product to determine the area of a parallelogram and 
the volume of a parallelepiped. 


Understand that a matrix can be used to represent a linear 
transformation, and know what this means geometrically for a 2 x 2 
matrix. 


Add, subtract and multiply matrices of suitable sizes, and multiply a 
matrix by a scalar. 


Understand the terms transpose of a matrix, zero matrix, identity 
matrix, inverse matrix, invertible matrix and non-invertible matrix. 


Evaluate the determinants and inverses of 2 x 2 and 3 x 3 matrices, 
and know how to perform such calculations for n x n matrices. 


Use the determinant of a matrix to evaluate vector products, areas 
and volumes. 
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Solution to Exercise 1 


A sketch map is shown below. 


Milton Keynes 


47 km * 
Oxford 
London 
133 km 74km 
Brighton 


From the map, the approximate distance between Milton Keynes and 
London is 80 km, and the displacement from Milton Keynes to London is 
about 80km South-East. The distance between Milton Keynes and 
London can be described as the magnitude of the displacement from 
Milton Keynes to London. 
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Solution to Exercise 2 


(a) The equation h = 2a — 3b is interpreted as h = 2a + 3(—b), so h can 
be represented by the red arrow shown in the figure below. 


The arrow for a is 3 units long, and the arrow for b is 2 units long. So 
the arrow for 2a is 6 units long, and the arrow for —3b is 6 units long. 
These two arrows are perpendicular. The arrow for h forms the 
hypotenuse of a right-angled triangle, so we have 


|h| = 62 + 6? = 6V2. 


(b) Let 6 be the angle between the directions of h and 2a. Then the 
diagram shows that tan@ = 6/6 = 1, so 6 = 7/4 radians. 


Solution to Exercise 3 


The positive y-axis will be on your left. 


Solution to Exercise 4 
Systems (b), (c) and (d) are right-handed. System (a) is left-handed. 
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Solution to Exercise 5 
(a) By visual inspection, the vectors are 


a=2i+j=(2,1) and b=-—2i-—3j = (—2,-3). 


(b) Using equation (4), the magnitudes are 


a= Vf224+12=V5 and b= V/(-2)? 4+ (-3)? = v13. 


Using equation (5), the angle between a and the positive x-direction is 
given by 


cos Oy, = dg /a = 2/V5 = 0.8944 (to 4 d.p.), 
so 

6,, = arccos(0.8944) = 0.464 radians. 
For b, equation (5) gives 

cos 0, = by /b = —2/V13 = —0.5547 (to 4 d.p.), 
so 


6, = arccos(—0.5547) = 2.159 radians. 


Solution to Exercise 6 

(a) jal = P+ C1? = v5, 
fa) 4 etae oe = 9/30. 

(b) We have 


a 


0, = arccos ( 0.4636 radians, 


a |§ 
cag 
lI 
je) 
io 
(a) 
3 
fe) 
ida) 
—— 
Sal 8 


a 
0 = arccos (=) = arccos ( 
a 


= 7/2 radians. 


) = 2.0344 radians, 


Az 
6. = arccos (=) = arccos (= 
a 
So a lies in the ry-plane, between the positive x-axis and the negative 
y-axis. 
(c) a+ b= 3i+ 2j + 5k, 
2a — b = 31 — 5j — 5k, 
c+ 2b —- 3a = —4i+ 10j 4+ 8k. 
(d) We have PQ = 2a — b = 3i—5j — 5k. The point Q has the position 
vector OG, which is given by 
0G = OP +PO 
= (2j + 3k) + (3i — 5j — 5k) 
= 3i — 3j — 2k, 
so Q is the point (3, —3, —2). 
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(e) We have RS =a + 2b = 4i+ 5j + 10k. The point S has the position 
vector OSs , which is given by 
OS =OR+ RS 
= (i+j) + (41+ 5j + 10k) 
= 5i+ 6j + 10k, 
so S' is the point (5,6, 10). 


Solution to Exercise 7 


The magnitude of any vector is given by the positive square root of the 
sum of the squares of its components. In the case of the unit vector a, it 
follows that the magnitude is 


CRC RICn 


that is, 


| 2 2 2 
agra, ta, ga F 


a aS nn ——— — ——-CS—C— i 
a a 
as required. 
Solution to Exercise 8 


Relative to the origin of the Cartesian coordinate system, the two points 
have position vectors i+ j+ 2k and 21+ 3j+k. Thus the vector equation 
of the line is 


r(t) = (1 —t)(i+j + 2k) + t(2i1+ 3j +k) 
= (1+t)i+ (14 2t)j + (2—1t)k, 
where —oo < t < o. 
Solution to Exercise 9 


The acceleration is given by 


a(t) = v(t) = <(6t, 4, —1) = (6, 1227, 0). 


Solution to Exercise 10 


a+b = |al|b| cos = 2 x 4 x cos$ = 4, 
b-c = |b||c|cos@ = 4 x 1 x cosZ = 2v3, 
a-c= |al|c|cos? = 2x 1 x cos(Z +2) =2cos% =0, 


b-b = |b||b|cos@ = 4 x 4 x cos0 = 16. 


Solutions to exercises 
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Solution to Exercise 11 
(a) (a+b)-(a—b)=a-(a—b)+b-(a—b) 
=a-a—a-:b+b-a—b-b 
=a-a—a:b+a-b—b-b 
=a-a—b-b. 
(b) From equation (17), 
ja+ bl? = (a+b)- (a+b) 
=a-(a+b)+b- (a+b) 
=a-a+a-b+b-a+b-b 
=a-a+a-bt+a-b+b-b 
=a-a+2a-b+b-b. 


(c) When a and b are antiparallel, the angle between them is 6 = 7, 
consequently cos @ = —1 and a- b = —|a| |b| = —ab. 


Solution to Exercise 12 
(a) a-b=(4x1)+(1x —3)+(-5 x 1) = —-4. 


The negative sign tells us that the angle between a and b is between 
5 and 7 radians, i.e. it is an obtuse angle. 


(b) No, the scalar product is c-d=9—5+4=8. Since this is not equal 
to zero, the pair of vectors fails the test for orthogonality. 


Solution to Exercise 13 

For orthogonality, we require (p + Aq) -r = 0. This condition gives 
(p+ Aq)-r = (p-r)+A(q-r) =0. 

We have 
p-r= (3i+ 2j —k)-(Qi-j—k) =6-241=5 

and 
q-r = (-i+j+ 2k)- (2i-j—k) =-2-1-2=-5, 

so (p+ Aq)-r=5—5\ =0, hence \ = 1. 


Solution to Exercise 14 
(a) a-c=—2-—3+4+3=-2,a-d=-4+0+1=-3 and 
a-e=—2+3-—1=0. Thus only e is perpendicular to a. 
(b) We have 
a+2b=j+ 9k. 


The displacement from the origin to the point (1,1,1) is represented 
by the position vector r=i+j-+k. The corresponding unit vector is 
i Ali +j+k). The component of a+ 2b in the direction of the 
specified displacement is therefore 


F+ (a+ 2b) = (i+ j+k)- (G+ 9k) =. 


(c) First we need to find a unit vector in the direction of the vector 
a — 2b. Working in ordered triple notation, a — 2b = (4, —7,—7) and 
its magnitude is |a — 2b] = 16 + 49 + 49 = V/114. Consequently, a 
unit vector in the direction of a — 2b is given by 
(a — 2b)/V114 = (4, -7, -7)/V114. So the required component is 


(a+ 2b) - Tee(a— 2b) = (0, 1,9)» Zs (4, -7, -7) 


V114 
_ =70 
v114 


=-6.56 (to2dp.). 


Solution to Exercise 15 


Since v has magnitude 4, and makes an angle of 27/3 with the positive 
x-axis, its component along the i-direction is 


4 
v-i=|v| |i] cos(27/3) = = —2. 


Since v makes an angle of 7/6 with the positive y-axis, its component 
along the j-direction is 
‘ : 3 
wpa isl Gears) = ae ~ 23, 
Hence we can write v as 


v = 214+ 2V3j. 


Solution to Exercise 16 
Since U, V and w are mutually perpendicular unit vectors, we can write 
a= a,U 4+ ayV + ayw, 


where a,, @y and ay are the components of a along the u, V and w 
directions, respectively. Hence 


1 2 

ay = a-ti = —(2,1,0)- (1,0,1) = ~— = v2, 
5 (21,0) + (1,0,1) = 
1 2 

dy =a-¥ = —(2,1,0)- (1,0,-1) = == v2, 
5 (21,0) (1,0, -1) = 

iy Hae W = (1,0) 0,1,0) = 1 

So 


a= V204 V204 8. 


We can check that this solution is correct by substituting in the numerical 
values for U, V and w: 


1 1 
a= V2 (10, 1)+ vIFe (1, 0, —1) + (0, 1,0) = (2, 1,0), 


as required. 


Solutions to exercises 
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Solution to Exercise 17 

(a) Since a X b = —b X a for any vectors a and b, we have 
jxi=-k, kxj=-i and ixk=-j. 

(b) By definition, a x a= 0 for any vector a, so we have 


ixi=jxj=kxk=0. 


(c) (ix (i+k)) —((i+Jj) x k) 
= ((ix i)+ (ix k)) -((ix k) + G x k)) 
= (0--(=j))—(-—j +a) 

(d) (it+k) x (i+j+k) 


= (ix (i+j+k))+(k x (i+j+k)) 
=(0+k+(-j))+0+(-)) +0) 
=-i+k. 


Solution to Exercise 18 


Since the origin is one of the three points, the sides of the triangle will be 
defined by the position vectors of the other two points, that is, by the 
vectors r; = (2,1,1) and rg = (1,—-1,—1). The area of the triangle will be 
half the area of the parallelogram with sides defined by the vectors r 1 

and rg. It therefore follows from the expression for the area of a 
parallelogram that the area of the triangle is 


$\r1 x ro| = $|(2, 1, 1) x (1,-1,-1)| 


= 4/(1(-1) — 1(-1), 1(1) — 2(-1), 2(-1) — 1(1))| 
1 2 9)\2 — 1 a 
=35V3 +{=3) a) aa: 


Solution to Exercise 19 


Since the origin is one of the corners, two adjoining sides of the 
parallelogram will be defined by the position vectors r; = (a, b,0) and 
rg = (c,d,0). Using the vector product method, the area of this 
parallelogram is 
Iry X re| = |(a,b,0) x (c,d, 0)| 

= |(b(0) — 0(d), O(c) — a(0), a(d) — b(c))| 

= |(0,0, ad — be)| 

= |ad — bel. 
If b= c=0, then the corners are (0,0,0), (a,0,0), (0,d,0) and (a,d,0), so 
the parallelogram is a rectangle. The above formula for the area becomes 
|ad|, which is as expected. 


Solutions to exercises 


Solution to Exercise 20 


Given that both a x b and ¢ are non-zero, (a X b) X c = O implies that 
the angle between the direction of a x b and the direction of c must be 0 
or 7 radians (so that its sine is zero). However, a X b is always 
perpendicular to the plane containing a and b, so we can say that c, which 
is parallel or antiparallel to a x b, must also be perpendicular to the plane 
containing a and b. 


Solution to Exercise 21 


(a) is 2 x 2 — a square matrix. 


(b) is 3 x 3 — a square matrix. 
(c) is 1 x 2—a row matrix. 
(d) is 3 x 1 — a column matrix. 


Solution to Exercise 22 
wafl-B J)-t 
oaf-6 C16 
oaGl-6 JB Et 
Solution to Exercise 23 


(a) Either by examining the columns of D, or by multiplying i and H 


by D, we see that (1,0) is mapped to (3,0), and (0,1) is mapped to 
(0, 2). 


© oL3-B IL 
so (3, -2) is mapped to (9, —4). 


(c) The transformation rescales the sides of a unit square in proportion to 
the rescaling of the unit vectors. Since (1,0) is mapped to (3,0), and 
(0,1) is mapped to (0,2), the area of the unit square will be enlarged 
from1lxl=1l1to3x2=6. 


Solution to Exercise 24 


(a) With a = 7, the required rotation matrix is 


(b) From the columns of R(4) (or by using matrix multiplication) it can 


be seen that (1,0) is mapped to (=. 45); and (0,1) is mapped to 
ded 
TP Va) 
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() RE) =] - i 7 | = 3, ~ val 


so (—1,—1) is mapped to (0, —/2). 


(d) (1,0) is mapped to (=. 4s); and (0,1) is mapped to (-+. ss). 
However, (=: Js) and (-<3. Js) are also unit vectors. So the unit 


vectors are mapped to unit vectors, and a unit square remains a unit 
square. So the area of a unit square is not changed by the 
transformation. (This is actually a general feature of rotations, not 


restricted to the case a = 4.) 


Solution to Exercise 25 
D(«, A) =I when & = 1 and A= 1. 


R(a) =I when a is zero or any even multiple of 7, i.e. a = 2n7, where n 
is any integer. 


Solution to Exercise 26 


on(f els a) [eh 


oF IE 1-ERH8 wee]-B 4 
) 
) 


OE Jb SJ-bGH8 staal-Li J 
of Jb )-B383 Buc)-f q 
The solutions to parts (b) and (c) show that the two matrices do not 


commute. 


Solutions to exercises 


Solution to Exercise 28 


The required matrix is given by the matrix product C = R(2) D(2, 1). 
Recalling that sin 7 = cos | = won we have 


ola P : _[22@-2O ZO- ZO 
5 wll UY |28+30 40O+30 
-[= 


r-[? ve v8] _ [2Ja) +0) 2-yq) + (7) 
LOU Le ge] L8G) +1G5) (Fe) + 1G) 
ak: —25 
a 


F clearly differs from C. 


Solution to Exercise 30 
We have 


R(a)R(S) = | 


_ Espa ese: —cos asin @ — sinacos B 


cosa —sin “] ie B —sin | 


sina cosa | |sin@  cosf 


sinacos$+cosasin@ —sinasin’+cosacos §} ° 


Now use the trigonometric identities 


cos a cos 3 + sina sin 3 = cos(a ¥ £), 


sinacos 8+ cosasin 3 = sin(a + £) 
to get 


cos(a+ 8) —sin(a + ) 
R(a) R() = bee + f) Pes +8) |" 


But the right-hand side of this equation is just the matrix R(a + {). 
Hence we have shown that 
R(a) R(8) = R(a + 6). 


Now we can either calculate R(3) R(a), and show explicitly that 
R(a) R(G) = R(8) R(a), or we can save ourselves some work by noticing 
that 


R(a) R(6) = R(a + 6) = R(6 + a) = R(6) R(a). 


So R(a) and R(3) commute, as expected on geometric grounds. 
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Solution to Exercise 31 


Remembering that cos(—a) = cos(a) and sin(—a) = — sin(a), we have 
cosa —sina} |cos(—a) —sin(—a) 
R(a : 
sina cosa | |sin(—a) cos(—a) 


cosa@ —sina cosa sina 
sinq@ cosa —sina cosa 


cos? a : sin? a 0 
cos? a + sin? a 


oe ees ee 


—sina@ cosa| |sina@ cosa 


| 
cosa sin "| ie a —sin °| 
| 


cos? a + sin? a 0 
0 cos? a + sin? a 


So R(qa) is indeed the inverse of R(—a), as claimed. 


In fact, there was an easier way to prove this. From Exercise 30, we know 
that R(a) R(3) = R(3) R(a) = R(a+ £). Furthermore from Exercise 25, 
we know that R(0) =I. Hence 


R(a) R(—a) = R(—a) R(a) = R(a — a) = R(O) = 1. 
Solution to Exercise 32 


i d —b||a bj 1 ad—be —ab+ba 
ad — be |—c al| |e dl ad—bc |cd—de —cb+da 


-( i: 


(a) Using the general inversion formula, 


4 ft =i 
=f 


Solution to Exercise 33 


NIP NIE 
eee 
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For the matrix D = ABC, the product of three matrices can be 
interpreted as (AB)C or A(BC) since matrix multiplication is 
associative. Adopting the first option, 


p-ape-rapje=([! fh 9) [ty 
=[0 af 
=[ 


Applying the matrix inversion formula, 

=], —_ =a | 2 Al —1 2 _ 1 il —2 

D~ = (ABC) =-4/73 of =2/_1 ol: 

(b) We have 

sjig—t\(h Ot =11 i. = 

ptate ls lo a= lo aa): 
hence 

iyety =i a), A A) it = 2) a) =e 

as ae eee ae O}’ 
which is indeed equal to D~! = (ABC)7?. 


Solution to Exercise 34 


—-1 2 


= has no inverse, because its determinant is —2 x 2— (—1 x 4) =0. 


In the other three cases, the determinant is not zero, so inverses will exist. 


Solution to Exercise 35 
det A = 1, det B = —1 and det C = 2. 


It was shown in Exercise 33 that 


Q -—2 
ase=[° ~2) 


Consequently, det(ABC) = —2. 
Thus det(ABC) = det A det B det C. 


You will see later that this is more generally true: the determinant of a 
product of matrices is always equal to the product of the individual 
determinants. 


Solution to Exercise 36 


The columns of A show the effect that it will have on the Cartesian unit 
(column) vectors i and j: they will become the vectors (1,3) and (2,6), 
respectively. These vectors are parallel. Hence A does indeed transform 
the unit square to a geometric object with no area, namely to a finite 
portion of a straight line. 


Solutions to exercises 


th) 
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The point (0,0) is transformed to the point (0,0). The line in question is 
y = 3a, and the unit square is mapped to the portion with 0 < x < 3, 
since the point with coordinates (1,1) is mapped to the point (3,9). The 
result is depicted in the figure below. 


y 
y 9 
1 = 
0 al x 0 2 L£ 


Solution to Exercise 37 


(a) F a ; 3) =e et 2] 5 1) 


il 1(3) 1(0) 1(-—4) 70 =a 
(c) H [3 0 -4) = [39 2(0) 8) =(5 0 7 
29: 7 
a ae 
bs dit 
= Nh 8) £5t) 2118 3(0) +1(3) +2(1) 3(1) +1(0) + 2( | 
~ |0(—2) + 501) + 1(4) 0(0)+5(3) +101) 0(1)+5(0)+1(-1) 
{3% 4 if 
=| 16 I 


(e) The product does not exist, because the left-hand matrix is of order 
3 x 1, whereas the right-hand matrix is of order 2 x 2. 


Solution to Exercise 38 


Taking care of the order specified, 


1 0 0} JO -1 O 0 —-1 0 
C=BA= 10 0 —-I] }1 0 0} = 10 0 —-1 
0 1 0} |0 0 1 1 0 0 
and 
0 -—1 O} J1 O 0 0 0 1 
D=AB= 1 0 0} JO 0 -1} =]1 0 0 
0 0 1) J]0O 1 0 0 1 0 


So BA # AB, i.e. the rotations A and B do not commute. 


Solution to Exercise 39 


Solutions to exercises 


12 2 5)\" [3 7)* or 
(b) (A+B)P=[/3 4) +]-1 —4/] = [2 0 -Fo7 
Do 6 3 1 8 7 
and 
oe lh BS 6), =t 3) 13. e 8 
ole =F 4 sl + [5 4 eae 0 alk 
Thus (A+B)? = A? +B’. 
T 7 
1 2 5 «66 
(co) (AG? =] \s a E 4 =/11 12 =(3 - i 
5 6 17 18 
and 
roe it 2) S 6) 1S a. a 
oe = Ale 4 lle 12 18]° 
This (AC)? =c A’. 
Solution to Exercise 40 
2 -l 2 
(a) The determinant of A= |5 1 —1] is 
2 1 -l 
1 -1 09 —l 5 61 
det A = 2}, a+h “| +2) I 


=2x (-1+1)+(-5+2)+2x (5-2) =0-346=3. 

ib 2 =H 

(b) The determinant of B = i 0 -1) i 
3-1 


2 © 
= -2x (0+2) -—1x (-3-0) =-443= -1. 


et B=0-2 


Solution to Exercise 41 


If we follow the last rule by adding twice the first column to the third, we 
do not change the determinant but we do obtain a column of zeros. Hence 
the determinant is zero. 


Solution to Exercise 42 


(a) We have 
i j k 
5 0 2 0 2 5 
rxs=|2 5 0 -| i-| + c 
5 1 gf cd 2) 2 ae" 2 2 


= 10i — 4j — 12k = (10, —4, —12). 
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(b) The area is given by |r, X r2|, where 


: 4. i 
1 0 1 0 1 1 

ry Xrg=]1 1 o| =| - i-| lit ik 
22 9 2 2 2 2 2 2 


= —2i+ 2j = (—2,2,0). 
So, taking the magnitude, the area is 


ri X re| = /(—2)? + (—2)2 = V8 = 2v2. 


The volume is given by |a- (b X c)|, where the sides may be taken in 
any order. We have 


2 5 0 
a-(bxc)=/1 -2 0 =2|75 3-5) | +9] 73 
1 -3 2 


= 2(—4) — 5(2) = —18. 
So, taking the magnitude, the volume is 
ja-(b x c)| = 18. 


Note that the determinant may be obtained more quickly by 
expanding on the third column rather than the first row, in which case 
we would get 


a: (b x c)= 0-042] 2 == Baie, 
as before. 
Solution to Exercise 43 
2 —-1 2 
The cofactors of A = |5 1 —1] are 
2, 1 -l 
1 -l 5 -1 5 1 
Cu=t+l, “i|=°. Ci2=—|5 i = 3, Cia = +15 | =8 
—l 2 2 2 2 -l 
Co. = — 1 | = 1, Co. = + > -1 = —6, C23 = — 9 i = —4, 
—l 2 2 2 2 -l1 
Thus 
y. 3 3 0 1 = 
C=) 1 -6 =4) and C’=/|3 —6 12). 
-1 12 7| 3-4 7| 
So 
0 1 -l 
1 1 
| T 
= Co == 13. —6 12 
det A 31g _4 7 


As usual, you should check that AA~! = I by explicit multiplication. 
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Unit 5 


Linear algebra 


Introduction 


Figure 1 shows the network of water pipes connecting the pump and five 
taps that control the heating system in a (fictional) greenhouse. The 
engineer who maintains the pump can determine the pressure P at which 
hot water is supplied to the system, and the tightness T; (¢ = 1,...,5) of 
each tap. He knows from experience that making the system perform as 
required throughout the day is no easy matter. The effect of each part of 
the system depends on the rate at which hot water flows through it. 
Because the system is closed, all the water that flows into any junction 
must flow out again. As a result, there are only three key rates of flow, 
namely 1, fo and fs, that together determine the rate of flow in every 
part of the system (see Figure 1). However, the equations that relate these 
flow rates to the supply pressure and tap settings are not simple. 


The engineer knows that the system is described by the following 
simultaneous equations (which you are not expected to derive): 


Vifi + Taft —Tafo = P, 
To fo+T5 fo —Tsf3 —Tafi + Tafo = 0, 
sig = i524 1533: =i 


These equations can be expressed in terms of matrices as 


TM4+T% = 0 fi P 
—I% Tetih+i5 —Ts fo| = | 0 
0 —Ts T3 + Ts fs 0 


Solving this matrix equation is typical of the kind of problem that is best 
treated using the methods of linear algebra. 


Linear algebra is a branch of mathematics that grew out of the solution of 
systems of equations like that given above. The equations are said to be 
linear because the variables of interest, in this case the flow rates f,, fo 
and fs, appear only to the first degree in each term; there are no powers of 
flow rates, such as f? or f3, nor are there any products of flow rates, such 
as fi fo or fofz. Although there are several variables, the equations are a 
generalisation of the kind of linear equation that describes a straight line 
in two-dimensional Cartesian coordinates, y= ma +c. This ‘linearity’ 
makes the equations well suited to representation by matrices, and once 
they are in matrix form, solving them is assisted by everything that 
mathematicians have learned from their studies of matrix algebra and 
determinants. Linear algebra is the subject that gathers together those 
insights and methods, and generalises them to create a coherent body of 
mathematical knowledge rather than an assortment of calculational tricks. 


Linear algebra is an important topic in modern mathematics. It can be 
approached in many different ways, ranging from the very abstract, in 
which there is little or no mention of linear equations, matrices or 
determinants, to the very practical, in which there is a concentration on 
matrix methods and the practicalities of computer-based calculations. 


Introduction 


Figure 1 The pump and 


network of taps that control a 


hot water heating system 
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The prefix eigen comes from 
German and indicates ‘inherent’, 
as these things are inherent to a 
matrix. 
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In this unit we aim to steer a middle course, emphasising the language of 
matrices and determinants while providing an introduction to those parts 
of the subject that are most relevant to applications without getting 
bogged down in calculational details. With this aim in mind, we will 
concentrate on two important approaches to problem solving that are 
characteristic of linear algebra. The first is the use of ‘row operations’, 
which form the basis of a range of methods for solving systems of linear 
equations. The second is the subject of ‘eigenvalues’ and ‘eigenvectors’, 
which are numbers and vectors associated with a matrix. As you will see, 
knowing the eigenvalues and eigenvectors of a matrix can greatly assist the 
solution of problems involving that matrix. 


Study guide 


Section 1 shows how to use row operations to solve a system of 

n simultaneous equations of the form Ax = b, where A is a given n x n 
matrix, b is a given column vector with n elements, and we are required to 
find x, the n-element column vector that represents the solution. The 
method that we will use is called Gaussian elimination. 


Section 2 addresses the question of determining the result x, = A’xp of 
k applications of a square matrix A to an initial column vector xg. This 
leads us to study the equation Ax = Ax, where the scalar is called an 
eigenvalue of the matrix A, with x being the corresponding eigenvector. 


Section 3 gives the general method for finding eigenvalues and 
eigenvectors. (This will be of particular use in the next unit.) 


In each of Sections 1 and 3, there are procedures for solving problems in 
linear algebra. Take care that you understand the examples given for these 
procedures and can do the corresponding exercises; these give a good idea 
of the assessable learning outcomes of this unit. 


Section 4, which is optional, contains some further results concerning the 
eigenvalues and eigenvectors of symmetric matrices. 


1 Linear algebra, row operations and 
Gaussian elimination 


1.1 Linear equations and their manipulation 


You should already be familiar with the notion of a linear equation as a 
generalisation of y= ma +c, but let us start with a definition so that 
there can be no doubt. 
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Linear equation 


An equation involving n variables 71, 2%2,..., 2» is said to be linear in 
each of those variables if it can be written in the form 


a, x1 + ao%Q + .0a3%3 +--+ + Ann = d, 


where a1, @2,...,@, and d are constants. 


Example 1 


Show that the Cartesian equation of a straight line, y = mz +c, is a linear 
equation by identifying each of the variables and constants. 


Solution 


The equation can be rewritten in the form y — mx = c. If we compare this 
with the general form of a linear equation of n = 2 variables, we can see 
that the rewritten Cartesian equation is linear if we make the 
identifications aj = 1, 71 = y, a2 = —m, %2 =x andd=c. 


Exercise 1 


The Cartesian equation of a plane that passes through the origin of a 
three-dimensional system of coordinates is ax + by + cz = 0, where a, b 
and c are constants. Is this equation linear in the variables x, y and z? 
Justify your answer. 


A very simple system of linear equations to solve would consist of the two 
equations 

3x2 + 2y = 8, 

z-y=l1. 
In order to solve them, we should try to isolate one of the variables, x or y. 
In this case that can easily be done by adding twice the second equation to 
the first. This eliminates y and produces the equation 

52 = 10, 
implying that « = 2. Substituting this back into either of the original 


equations immediately shows that y = 1. 


What follows will not always be so easy, but much of this first section will 
essentially be a systematic elaboration of the manipulation that has just 
been carried out. 
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Consider first a system of three simultaneous linear equations, (£1), (£2) 
and (£3): 


Ly = 4x5 faire 223 = —9, (F4) 
321 = 2x2 a i 323 => i (2) 
8241 = 2x2 ale, 9x3 = 34, (£3) 


with three unknowns, x1, © and 23. 


We may obtain a system of two simultaneous equations, for x2 and x3, by 
subtracting suitable multiples of (£1) from (£2) and (£3). In this example, 
we may subtract 3 times (£) from (£2), to obtain 


10x = 323 = 34. (E22) = (EF) = 3(E;) 


(Note that we have shown on the right the manipulations required to 
obtain the equation on the left.) We may also subtract 8 times (£)) from 
(E3), to obtain 


30x2 — 7x3 = 106. (E3.) = (£3) — 8( £1) 
So now we have two simultaneous equations, neither of which involves 71. 
Next we may subtract 3 times (£2,) from (£3,), to obtain 

Dt =A, (E3p) = (E3a) — 3( Fa) 


which tells us that x3 = 4/2 = 2. Substituting this result back into (Ea) 
tells us that x2 = (3x3 + 34)/10 = 4, and substituting the values for x73 and 
x2 back into (F}) tells us that 71 = 4r2 — 273 — 9 = 3. So the complete 
solution of the system is 71 = 3, rg = 4, x3 = 2. 


Next we examine a convenient way to represent this process, using matrix 
notation. 


1.2 The augmented matrix and row operations 


Given a set of linear equations, the first thing to do is write them in a 
standard way, with the constants on the right and the three variable terms 
on the left. Equations (£)), (E2) and (£3) above are written in this way. 
Notice how the variable terms in x1, rg and x3 are lined up in separate 
vertical columns. If there are any missing terms (where the coefficient is 
zero), leave a gap in the column. The form of equations (£1), (£2) 

and (£3) immediately suggests that they can be expressed as a single 
matrix equation 


1 -4 2 LY —9 
3 -2 3| |zo| =| 7]. (1) 
8 —2 9} |23 34 


The correctness of this suggestion can be confirmed by working out the 
matrix multiplication on the left-hand side. This gives 


21 — 4x9 + 223 —9 
3x1 — 242 + 343| = | 7 
8x1 — 242 + 9x3 34 
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The matrix on the left can equal the matrix on the right only if 
corresponding elements are equal, which gives the three equations (£1), 
(EF) and (E3). 


We can represent the matrix equation (1) symbolically by Remember that the emboldening 
of printed symbols, such as 
Lod 2 oa =a A, x, b, can be indicated in 
Ax=b, whereA= {3 —2 3], x= |22} andb=] 7]. handwritten work by using an 
8 2 9 L3 34 underline, as in A, x, b. It is 


; ; , — _. particularly important to 
Any system of linear equations may be represented in a similar way, even if remember this when using a 


some initial rearrangement is needed to achieve the required layout. Once — symbol to represent a vector 
the equations have been expressed in matrix form it is conventional to call quantity. 

A the coefficient matrix, x the unknown vector, and b the 

right-hand-side vector or sometimes the constant vector since it 

represents the terms that are independent of the variables. 


Exercise 2 
Write each of the following systems in the matrix form Ax = b. 
(a) zi+ %2- 23 =2, 
5x1 + 2x9 + 2x3 = 20, 
4a — 222 = 323 = 15: 
(b) -2+ 49-23 =—21, 
—2%3 _ 2x2 = OX1 _ 20, 
—2%2 = 323 = 15- Ax}. 


(c) 2a + 3y — 4z =0, 
2x + 3z = 3, 
6y — 2z = 0. 


Using matrix notation, we could represent each of the intermediate stages 
of the calculation in Subsection 1.1 as a matrix equation. All we would 
have to do is to change the coefficient matrix and the right-hand-side 
vector in an appropriate way at each stage. You are welcome to check this 
for yourself, but doing so is really something of a waste of time since it 
would involve a lot of repetitive writing that would not achieve anything 
new. A better approach is to introduce a new ‘shorthand’ that captures 
just the essential information and the steps in the calculation. To do this, 
we introduce a notational device called the augmented matrix that 
combines the elements of the coefficient matrix and the right-hand-side 
vector. For the matrix equation (1) given above, the augmented matrix is 


written as 
1 -4 2 | -9 
Alb=|3 -2 3 7 
8 -—2 9 34 


Don’t look for any deep significance in the augmented matrix. It’s not a 
new ‘kind’ of matrix, just a useful means of efficiently displaying the 
crucial information in a system of linear equations. To show how that 
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Note that we had to subtract a 
multiple of R2, here. Had we 
tried to subtract a multiple of 
R, from R3,, we would have 
gained an unwanted non-zero 
entry in the first column of the 
third row. 
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information changes as the equations are manipulated, it is helpful to label 
each row. We do this with a bold R (for row) and a subscript as follows: 


i, ot of 0 | 
Alb=|3 -2 3] 7]| Rp. 
§ <9 9 | 24 | Ra 


Using these notational devices, we can compactly indicate any step in the 
manipulation of the simultaneous equations. For instance, suppose that we 
want to change the second equation by subtracting three times the first 
equation from it. We can show this using the augmented matrix by doing 
the following steps. 


1. Write down row R, unchanged. 


2. Indicate the planned change as Rz — 3Ry, and write that to the left of 
the second row. 


3. Write the values resulting from the change in the second row. 
4. Relabel the second row as Rag. 

5. Write down row R3 unchanged. 

So in this particular case we get 


1 -4 2] -9 Ri 
Ro — 3R1 0 10 —-3 | 34 Raa . 
8 —-2 9) 34 Rg 


Having just reduced the coefficient of x; to 0 in the second row, the next 
step in solving the simultaneous equations is to make a similar reduction in 
the third row. This is again achieved by subtracting (or adding) an 
appropriate multiple of another entire row. In this case we subtract 

8 times R; from R3, indicating this as follows: 


1 -4 2] -9 Ri 
0 10 —3 34 Raa - 
Rz3 — 8R1 0 30 —7 | 106 R3a 


Here we are again showing the steps that must be taken on the left, the 
resulting augmented matrix in the middle, and the updated row labels on 
the right. Our next step is to reduce to zero the coefficient of x2 in the 
third row without upsetting the pattern of zeros in the first column. This 
is achieved as follows: 

1 -4 2] -9 

0 10 —3 | 34 

Rzs,—3R2, | 0 O 2 4 


Since we have completed all the row operations involved in our particular 
problem, we have deliberately omitted an updated set of row labels to the 
right of the final augmented matrix, though we could easily have included 
them if they were needed. 


Note that all the operations that we have recorded on the left have 
involved subtracting multiples of one row from another row. These are all 
examples of what are generally referred to as row operations on the 
augmented matrix. The full range of allowable row operations mimics 
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those steps that we might have taken in solving the original system of 
equations. They can all be built from successive combinations of the 
following three elementary row operations. 


Elementary row operations 


e Interchange any two complete rows (Rj + Rj). Note that row operations always 
involve entire rows of the 


e Multiply each element in a row by the same constant Sriemenked matic, thine 


(Ri > ARi). including the entries from the 
e Add (or subtract) a constant multiple of one row to (or from) cca paw bere as well as 
another row (R; > R; + AR;). parr oe 


The overall effect of the row operations involved in our example has been 
to produce a triangle of zeros in the bottom left-hand corner of the 
augmented matrix. If we use that augmented matrix to write down the 
final set of linear equations in matrix form, we obtain 


1 —4 2 XY —9 
O 10 —3] |v} = | 34 
0 O 2] | x3 4 


The coefficient matrix is now said to be an upper triangular matrix, 
since the only non-zero elements that it contains are on or above the 


leading diagonal, i.e. the diagonal from top left to bottom right, here The leading diagonal is 
containing the numbers 1, 10 and 2. sometimes called the main 
diagonal. 


Performing the matrix multiplication on the left-hand side gives the linear 
equations 
%1 — 4x%2q + 243 = —9, 
10x%2 — 3x3 = 34, 
2x3 = 4. 


Note that the upper triangular structure of the coefficient matrix means 
that each successive equation has one fewer unknown than its predecessor. 
Thanks to our successive elimination of unknown variables, we can see 
from the last equation that 73 = 2, and we can substitute that into the 
equation immediately above it to find that x2 = 4. This procedure is called 
back substitution. Having found the values of x2 and x3, we can 
perform another back substitution to find x; = 3. 


So we have determined our solution x; = 3, 72 = 4, +3 = 2. However, it is 
always good practice to check it by writing down the solution in matrix 
form, and then using matrix multiplication to confirm that it solves the 
original matrix equation. 


In this case x = [3 4 2)", and we can check its correctness as follows: 


1 —4 2] [3 1x3-4x4+2x2 —9 
3 -—2 3] |44 =|38x3-2x44+3x2| =| 7 
8 -—2 9] |2 8x3-2x4+9x2 34 
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So our solution is correct. Even more pleasingly, it has been arrived at in a 
systematic way that can provide the basis of a general method for solving 
systems of simultaneous linear equations. This is the subject of the next 
subsection. 


1.3 Gaussian elimination: non-singular cases 


The technique that we have just been using is an example of Gaussian 
elimination — a general method for finding the unique solution of a system 
of linear equations, when such a solution exists. This subsection provides a 
more formal introduction to that method. As you will soon see, there are 
cases where no solution exists, other cases where an infinite number of 
solutions exist, and some cases where the solution is unique but the system 
of equations must be reformulated before Gaussian elimination can be used 
to find it. In this subsection and the one that follows, we will use examples 
and exercises to explore each of these exceptional cases, but first we set out 
a general procedure that is enough by itself to solve most cases of practical 
interest. 


Procedure 1 Gaussian elimination 


To solve a system of n linear equations in n unknowns, with 
coefficient matrix A and right-hand-side vector b, it is often sufficient 
to carry out the following three steps. 


1. Formulation: Write down the augmented matrix A|b with rows 
Ripea ss tee 


Remember that performing a 2. Elimination: Adapt the following row operations as necessary. 


ti lly invol ; 
ales ee reas eee tee (a) Subtract a multiple of Ri from Re, to reduce to zero the 


e write down the plan first element in the first column below the leading diagonal. 
e implement the plan (b) 


gel hel haces a Similarly, subtract a multiple of R; from R3,...,R,», to 


reduce to zero all the other elements in the first column 
below the leading diagonal. 


(c) In the new matrix obtained, subtract multiples of R2 from 
Rz,..., R,» to reduce to zero all the elements in the second 
column below the leading diagonal. 


(d) Continue this process until A|b is reduced to U|c, where 
U is an upper triangular matrix. 


3. Solution: Solve the system of equations with coefficient 
matrix U and right-hand-side vector c by back substitution. 


Though not part of the procedure, it is generally good practice to write the 
solution as a column matrix, and then confirm by matrix multiplication 
that it is indeed a solution of the original system of equations. 
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Example 2 
Solve the simultaneous equations 
321 + Z2- 2£3=1, 
Ot, + 2+ 2%3 = 6, 
4a — 2x49 — 3x3 = 38, 
and check the solution by matrix multiplication. 
Solution 


Following Procedure 1, starting with the formulation, the augmented 
matrix representing these equations is 


3 1-1] 1 Ry 
5 1 2)6 Ro. 
4 —2 -3 | 3 Rs 


Moving on to the elimination stage, to reduce the elements below the 
leading diagonal in column 1 to zero, replace Rg by R2 — 3R1 and call the 
result Ra, then replace Rg by R3 — 4Ri and call the result R3,: 


S tf =f |i 7) Ry 
R.-#R |o -2 2/2 | Ro 
Ra-#R, |o -2 -§ | § | Re 


To reduce the element below the leading diagonal in column 2 to zero, 
replace Rg, by Rg, — 5Raa: 


3 1 —-l 1 
0 2 iat 13 


Rzga—9dRa LO 0 


This completes the elimination stage. We now have to solve the equations 
represented by the new matrix, i.e. 


321+ woa- 23=1, 
2p + Mas = 1B, 
= 20x3 = —20. 


It is clear from the last equation that 73 = 1. So by back substitution, 
v2 = $(11z3 — 13) = —1 and 2 = $(1 — #2 +23) =1. 


We may verify that x =[1 —1 1] is a solution of the matrix version of 


the equations as follows: 


3 1 -l 1 3—1-1 1 
5 1 2} |-1] = |5—1+2] = |6 
4 -2 -3 1 4+2—-—3 3 


It would be neater to replace Re 
by 3R2 = 5R, and R3 by 

3R3 — 4R}1, since this would 
avoid the use of fractions. We do 
not do this because it would 
involve combining elementary 
row operations where we prefer 
to use them individually at this 
stage. Nonetheless, such 
shortcuts are common, and we 
will use them later in the unit. 
You will probably use them 
yourself as your confidence 
grows. 


91 


Unit 5 Linear algebra 


1.) 
“eS 


ee | 


ror 


SRP | EH 


SS | Se ASAE SY 


TEL! ae | Pea Sk SAE 


| ee 


t 
& 
a 
aa 
| 
i 
iW 
li 


Figure 2. An ancient 
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Exercise 3 
Solve the simultaneous equations 
t+ t2- £3 = 2, 
5x1 + 2x9 + 2x3 = 20, 
4x1 — 2x2 — 343 = 15, 


and check the solution by matrix multiplication. 


All the examples of Gaussian elimination considered so far have started 
with a non-zero coefficient on the first variable in the first equation. One 
problem with Procedure 1 is that it will not work if that first entry in R; 
is zero, since we would then have to divide by zero, which is not allowed. 
Fortunately, this is easy to overcome. All we have to do is reorder the 
equations in an appropriate way before formulating the augmented matrix. 
However, there is a deeper lesson to be learned from this. Reordering the 
equations is equivalent to interchanging complete rows in the augmented 
matrix, which is one of the allowed elementary row operations. Cases often 
arise in which it is advantageous to make such an interchange either before 
or even during the elimination stage. The following example illustrates 
this. (It is based on a problem set and solved in the Chinese text Nine 
Chapters on the Mathematical Art (Figure 2) written between 200 BC and 
100 Bc.) 


Example 3 


There are three types of corn, of which three bundles of the first, two of 
the second, and one of the third make 39 measures. Two of the first, three 
of the second and one of the third make 34 measures. And one of the first, 
two of the second and three of the third make 26 measures. How many 
measures of corn are contained in one bundle of each type? 


Solution 


Let x; represent the number of measures in one bundle of type 7, with 
i = 1,2,3. The problem is then specified by the simultaneous equations 
321 + 2%2 + 23 = 39, 
2%, +3%2+ x3 = 34, 
@1 + 2x9 + 3x3 = 26. 


If we just press ahead and solve this problem as given, we will have to 
introduce fractions as soon as we try to eliminate x; from the second row. 
We can avoid the immediate use of fractions if we first interchange the first 
and third equations, so that the coefficient of x; in the first equation is 1. 
The augmented matrix of the reordered set of equations is then 


1 2 3 | 26 Ri 
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Reducing the coefficients below the leading diagonal in the first column to 
ZeYO gives 


1 2 383 26 Ri 
Ry — 2Ry 0 -—l —5 | —-18 Raa - 
R3 — 3Ri 0 -—4 —8 | —39 Rsa 


Reducing the coefficient below the leading diagonal in the second column 
to zero produces 


i 2 3 26 
OG i -—§.| 18 
Rz,—4R2, | 0 O 12 33 


This completes the elimination. Back substitution then gives r3 = = a i. 
rt = 18-5273 = _ and 71 = 26 — 2x9 — 323 = a 


Now check that the solution x = [37 17 11)” satisfies the original 
matrix equation: 


1 1 2 3] |37 37 + 344+ 33 26 
—/2 3 1] J17) = i 74+ 51+11}] = | 34 
3.2 1) {il dit + 34+ 11 39 


You may find it convenient to rearrange the data in the following exercise. 


Exercise 4 


The economy of Ruritania has collapsed, and each of its three banks has 
been nationalised. Under state ownership, all bank employees are divided 
into three grades: managers, software engineers and clerks, with members 
of a given grade paid identically, without bonuses. Bank A employs three 
managers, two software engineers and 24 clerks. Bank B employs one 
manager, one software engineer and 26 clerks. Bank C employs no 
managers, three software engineers and 25 clerks. Under this radically new 
dispensation, the monthly salary bill of each bank is 137 thousand rurs, 
where the rur is the name for the unit of the (recently devalued) currency. 
How many thousand rurs per month are received by a manager, by a 
software engineer and by a clerk? 


In some cases it may be necessary to interchange a pair of rows later in the 
process, in order to achieve an upper triangular matrix at the end of the 
elimination stage. Here is an example where this happens. 


Example 4 

Solve the following system of equations: 
1+ 10x%2q — 323 = 8, 
r1 + 10x%2 + 2x73 = 18, 
w+ 4x9 + 2x3 = 7. 
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Solution 
The corresponding augmented matrix is 


1 10 —-83 8 Ri 
1 10 2] 13 Ro. 
1 4 2 7 R3 


To reduce the elements below the leading diagonal in column 1 to zero, 
replace Rg by R2 — R, and call the result Ry,, then replace R3 by 
R3 — R, and call the result R3,: 


1 10 —-3 8 Ri 
Re — Ri 0 O 5 5 Raa . 
R3 — Ri 0 -6 5] -I1 Raa 


In this case there is no need to perform another subtraction because a zero 
has fortuitously appeared in Rg,. All that is needed is an interchange of 
the last two rows: 


1 10 —-3 8 
Raa 0 -6 54-1 
Roa 0 O 5 5 


It follows that +3 = 1, and by back substitution r2 = —#(-1 — 5x3) =1 
and 2; = 8 — 10x2 + 3x23 = 1. (As usual, the matrix solution 
x=[1 1 1)” should be confirmed as correct, as a check.) 


In the previous example, the final exchange of rows was made to achieve 
an upper triangular matrix. However, it would have been possible to have 
solved the system of equations as soon as the unexpected zero appeared 

in Rg. In each of the following exercises, a solution becomes possible as 
soon as “1 is eliminated from all but one of the equations, even though an 
upper triangular matrix is not apparent. It is left to you to decide whether 
to make use of this potential shortcut or to follow the systematic approach 
adopted earlier. 


Exercise 5 


Solve the system of equations whose augmented matrix is 


0 0 14] 2 

2 7 Oc ]'5 

1 2 0.) 7% 
Exercise 6 


Solve the system of equations whose augmented matrix is 
1 
1 
1 
4 


Re wre 
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A comment on the efficiency of Gaussian elimination 


All of the examples and exercises in this subsection have involved a square 
coefficient matrix A with det A 4 0. Such matrices are described as 
non-singular and, as explained in Unit 4, are invertible, meaning that it 
is possible to construct an inverse matrix, denoted A~!, such that 

AA! =I. Such matrices may be contrasted with singular matrices that 
have det A = 0 and are non-invertible. The non-singular case is important 
because of the following general result. 


The system of linear equations represented by the matrix equation 
Ax =b, where A is a square matrix, has a unique solution if and 
only if A is non-singular (i.e. det A 4 0). 


Such a system of equations is said to be a non-singular system. 


In the non-singular case, the unique solution of the system of equations 
Ax = b is given by x = A~'b, where A7! is the inverse of A, as defined 
in Unit 4. 


It can be shown that the method of Gaussian elimination will always 
produce this solution, provided that we include any necessary interchanges 
of rows. Nonetheless, you may wonder why we have chosen to introduce 
Gaussian elimination when we could have used the methods of Unit 4 to 
construct the inverse of the coefficient matrix, and then used matrix 
multiplication to find the solution x = A~'b. The reason concerns the 
efficiency of the calculation, which becomes an increasingly important 
consideration as the number of equations grows and the order of the 
coefficient matrix increases. 


The method for finding a matrix inverse that was described in the previous 
unit involved a lot of manipulation. Using this method to find the inverse 
of the n x n coefficient matrix A might be sensible if n is small, e.g. n= 3 
or n = 4, but is a bad idea when n is large, for at least two reasons. 


e The method given in Unit 4 takes of order n! multiplications to 
compute A~!. Other methods, based on row operations, take roughly 
n? arithmetic operations. Since n° 
as n gets large, the methods based on row operations are much more while 
efficient. 


e Even if we use a faster method to compute A~!, we may find that we 
require high-precision multiplication to achieve only modest accuracy 
in our result, since there may be large cancellations between the many 
terms summed over in our calculations. 


For these (and other) reasons, the methods of Unit 4 are not those that are 
implemented by software packages that solve linear systems of equations. 
Gaussian elimination is widely regarded as a more efficient method for 
solving medium to large systems of linear equations — but there are 
exceptions to this, as the following box indicates. 


increases far more slowly than n!, 5! = 120 and 6! = 720, 


5° = 125 and 6° = 216. 
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Figure 3. A sparse matrix of 
order 100 x 100. The value of 
each element is indicated by 
the colour of the 
corresponding pixel (i.e. 
picture element). By far the 
most common value is zero, 
which is indicated by white. 


By this stage it is assumed that 
you can envisage the relevant 

2 x 2 coefficient matrix without 
needing to write it down. If this 
is not the case, treat this 
example as a further exercise. 
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Gaussian elimination and sparse matrices 


A large matrix A in which many of the elements are zero is said to be 
a sparse matriz. Such matrices arise in various applications, ranging 
from the modelling of engineering structures and electrical networks, 
to pure mathematical studies in the field known as number theory. 
An example of a sparse matrix is shown in Figure 3. In such cases, 
Gaussian elimination may be an inefficient method for solving 

Ax = b, since the development of an upper triangular matrix may 
well create many more non-zero elements than were initially present. 
Special methods have been devised for dealing with sparse matrices, 
which can handle millions of linear equations. 


To end this discussion of Gaussian elimination in non-singular cases, we 
note that, despite its general efficiency, Gaussian elimination should not be 
thoughtlessly applied to every system of linear equations. In particular, 
you should not apply it to a system consisting of just two equations, since 
such a small system is more easily analysed without the formal apparatus 
of Gaussian elimination. 


1.4 Gaussian elimination: singular cases 


If det A = 0, then we say that A is singular, and we also use that term to 
describe the corresponding system of linear equations Ax = b. In some 
singular cases there may be no solution; in others there may be an infinity 
of solutions. Some 2 x 2 examples will help us to see why. 


Example 5 


For each of the following pairs of simultaneous equations, write down the 
determinant of the corresponding coefficient matrix and determine whether 
there is no solution, a unique solution, or an infinity of solutions. 


(a) L+ y=4, (b) vr y=4, (c) L+ y=4, 
2x — 3y =8. 2% + 2y = 5. 2x + 2y=8. 
Solution 
(a) In this case the determinant of the coefficient matrix is —3 — 2 = —5, 


so the system is non-singular and there is a unique solution. 


(b) The determinant of the coefficient matrix is 2 — 2 = 0, so the system is 
singular. Subtracting twice the first equation from the second, we 
obtain 0 = —3, which is impossible. Hence there is no solution. (The 


equations are said to be inconsistent.) 


(c) The determinant of the coefficient matrix is again 2 — 2 = 0, so the 
system is singular. Subtracting twice the first equation from the 
second, we now obtain 0 = 0, which is indeed true. The equations are 
consistent in this case. In fact, the second equation is simply the first 
equation multiplied throughout by 2. Consequently, any pair of values 
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that satisfies the first equation will also satisfy the second equation. 
There are infinitely many such pairs (choose any value for x and then 
let y= 4-2). Hence there is an infinity of solutions. 


Note that in the example above, the evaluation of the determinant was 
sufficient to tell us whether a unique solution existed. However, in the 
singular cases, where there was no unique solution, the value of the 
determinant did not tell us whether there was no solution or an infinity of 
solutions. Further investigation is always needed to resolve such cases. 
This is true irrespective of the order of A. We can gain some geometrical 
insight into these different behaviours by recalling that a linear equation in 
two variables can be represented graphically by a straight line. We 
generally write the equation of such a line as y = mx +c, where m 
represents the gradient of the line, and c represents the intersection with 
the y-axis. 


The lines corresponding to the equations of Example 5(a) are depicted in 
Figure 4(a), and are seen to intersect at a unique point. The values of x 
and y at the intersection represent the unique solution to the linear 
equations. The lines represented by the equations of Example 5(b) are 
parallel (see Figure 4(b)), so they never meet and there is no pair of values 
that simultaneously satisfies the two equations. Finally, in Figure 4(c), the 
two lines, though superficially different, are actually the same, so any of 
the infinity of points on one line will also be on the other. Hence there is 
an infinity of solutions. 


A linear equation in three variables (e.g. ax + by + cz = d) represents a 
plane in a three-dimensional system of Cartesian coordinates. The 
constants (a, b, c and d) will determine the orientation of the plane and its 
perpendicular distance from the origin (the plane passes through the origin 
if d= 0). A system of three linear equations with a non-singular coefficient 
matrix of order 3 x 3 describes three planes that intersect at a unique 
point. The coordinates of this point represent the unique solution of the 
non-singular system of equations. This is illustrated in Figure 5(a). 


Figures 5(b) and 5(c) show cases in which the planes meet in two or three 
parallel lines. These are some of the cases where the determinant of the 
coefficient matrix will be zero, and the singular system of equations will 
have no solution. Another singular case in which there is no solution is 
illustrated in Figure 5(d), where the three planes are parallel and do not 
intersect at all. 


In Figure 5(e) we see a case where there are infinitely many solutions since 
all three planes share a common line. (The case of all three planes 
coinciding would also represent an infinity of solutions.) In these cases the 
coefficient matrix is again singular. 


This geometrical interpretation can be extended to higher dimensions 
using the idea of a hyperplane, but visualisation is increasingly difficult so 
we will not pursue that approach here. In any case, the basic point has 
already been well established and can be summarised as follows. 


() 


Figure 4 In two dimensions, 
a pair of simultaneous 
equations in two variables 
represents a pair of straight 
lines. Such lines may 

(a) intersect at a unique 
point, (b) be parallel and not 
intersect at all, or (c) be 
identical and intersect at an 
infinity of points. 


(e) 


Figure 5 In three 
dimensions, the planes that 
represent three linear 
equations may (a) intersect at 
a unique point, (b) intersect 
in two parallel lines, 

(c) intersect in three parallel 
lines, (d) not intersect at all, 
or (e) intersect in the same 


line or plane 


97 


Unit 5 Linear algebra 


In the case of singular systems, 
this exercise requires you to go 
no further than considering the 
number of solutions that a 
system will have. Examples and 
exercises that involve 
determining such solutions will 
be given in Section 3. 


NG aM 


Figure 6 Carl Friedrich 
Gauss (1777-1855) 
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The system of linear equations represented by the matrix equation 
Ax = b is said to be singular when the square matrix A is singular 
(i.e. when det A = 0). Such a system does not possess a unique 
solution; it may have no solution or an infinity of solutions. 


The following exercise is important. It should help you to recognise the 
possible cases, particularly those in which there is an infinity of solutions. 


Exercise 7 


For each of the following systems of simultaneous equations, determine 
whether there is no solution, a unique solution, or an infinity of solutions. 
(Hint: Examining determinants will not be sufficient; you are advised to 
consider the augmented matrices.) 
(a) v1 — 242+ 523 = 7, 

t+ 32% — 4x3 = 20, 

v1 + 1829 — 31lx3 = 40. 
(b) 21 — 2424+ 523 =6, 

%1+ 3%, — 473 = 7, 

221 + 6x2 — 12%3 = 12. 
(c) a1 — 4¢0+ 23 =14, 

021 — Zoa- £3 =2, 

6x21 + 1429 — 6x3 = —52. 


Gaussian elimination — a historical perspective 


Carl Friedrich Gauss (Figure 6) is widely regarded as one of the 
greatest mathematicians of all time, comparable with Archimedes and 
Newton. He spent most of his career as Professor of Astronomy and 
Director of the Observatory at the University of Gottingen, in Lower 
Saxony, now part of Germany. One of his many celebrated 
achievements was the determination of the orbit of the asteroid Pallas 
from a small number of observations in 1802. In 1810 he published an 
important paper about orbit determinations in which he used 
repeated observations of Pallas to make the best possible 
determination of its orbit. During this work he took a systematic 
approach to finding the six unknowns in the system of equations that 
represented the observations. In doing so, he devised a notation that 
became widely adopted. It is from this that the method of Gaussian 
elimination gets its name. 


2 Introducing eigenvalues and eigenvectors 


Interestingly, the current — rather broad — use of the term ‘Gaussian 
elimination’ may not be entirely justified, even if we ignore the 
evidence that priority belongs to the Chinese. It appears that the 
name of Gauss became attached to the general method of systematic 
elimination only in the 1950s, mainly as a result of confusion among 
mathematicians who were unaware of the method’s detailed history. 
The original inventor (at least as far as Europe is concerned) seems to 
have been Isaac Newton (Figure 7), who pointed out in 1670 that 
algebra textbooks lacked a procedure for solving systems of 
simultaneous linear equations, and proceeded to supply one. 


Figure 7 Isaac Newton 
(1642-1727) 


2 Introducing eigenvalues and 
eigenvectors 


As mentioned in the Introduction, eigenvalues and eigenvectors are scalars ‘Eigen’ is pronounced as 
(often numbers) and vectors associated with a matrix (the German word ‘eye-ghen’. 

‘eigen’ may be translated as ‘inherent’). They are widely used in the 

mathematical sciences, particularly physics. Rather than going directly to 

the definitions, we will first examine some examples of how they arise. We 

start with what is essentially an algebraic example, based on a 

mathematical model of population growth. This should help to establish a 

clear view of an eigenvector. Then, in a second example that is more 

geometrically based, we give particular emphasis to the idea of an 

eigenvalue. 


2.1 Eigenvectors in a model of population growth 


Consider the (fictional) towns of Exton and Wyeville, which have a regular 


interchange of population: each year, one-tenth of Exton’s population Exton 

migrates to Wyeville, while one-fifth of Wyeville’s population migrates to 

Exton (see Figure 8). Other changes in population, such as births, deaths i i 

and other migrations, cancel each other and so can be ignored. If x, 10 

represents the population of Exton at the beginning of year n, and yy is Wyeville 

the corresponding population of Wyeville, then the populations of the two 

towns at the beginning of year n + 1 are given by Figure 8 Exton and 

Wyeville annually exchange 

In41 = 0.9%, + 0.2yn, fixed fractions of their 


Ynt1 = 0.lan + 0.8y respective populations 
nm a * nm < ns 


which may be expressed in matrix form as 


Inti} _ {0.9 0.2] jan 
Yn+1 ~ 10.1 0.8 Yn|- 
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This matrix equation can be represented symbolically by 
Xn+1 = TXn, 


where the column vectors x, and x,41 represent the populations in Exton 
and Wyeville at the beginning of years n and n + 1, respectively, and the 
square matrix T is known as the transition matrix for the problem. 
(The entries in such a matrix are all non-negative, and the entries in each 
column sum to 1.) 


This annual exchange of populations is an example of an iterative process, 
in which the values associated with the (n + 1)th iterate can be 
determined from the values associated with the nth iterate. 


Suppose that initially the population of Exton is 10000 and that of 
Wyeville is 8000, i.e. x9 = [20 yo)’ = [10000 8000]”. Then after one 
year (i.e. at the beginning of year n = 1), the populations will be given by 
x, = Txo, so 


xi| — |0.9 0.2} |10000) — |10600 
yi| {0.1 0.8} | 8000} ~~ | 7400 |’ 
and after two years they are given by x2 = Tx}, so 
xo} {0.9 0.2} 10600} — | 11020 
y2| {0.1 0.8} | 7400 | | 6980 | ° 
Note that we might also have written this last result as 


xy = Tx, = TT xo = Ty, 


where we have introduced the power notation T? to indicate repeated 
(matrix) multiplication, just as we use powers such as a?, a and.a* to 
indicate the repeated multiplication of a scalar a. Using the idea of the 
power of a matrix to represent repeated matrix multiplication, we can 
describe the populations of Exton and Wyeville after n years as 


Xn = T”’ xo. (2) 


Given this relation, how do the populations of Exton and Wyeville change 
over time? The answers can be worked out using equation (2), though the 
repeated matrix multiplications are tedious to do by hand. To save you the 
labour, the results are indicated graphically in Figure 9. 


As you can see, the populations show a very interesting kind of behaviour. 
Initially they diverge, changing significantly from year to year, with Exton 
growing while Wyeville shrinks. However, those changes diminish with 
time, and after about 15 years the populations change very little, 
approaching ever more closely 12000 in Exton and 6000 in Wyeville. 


Explicit calculations (with results displayed to only the nearest integer) 
show that x29 = x39 = x31 = [12000 6000]. So after about 30 years, the 
two populations are very stable, despite continuing annual migrations. 


2 Introducing eigenvalues and eigenvectors 


Exton 


<| 
8 
= 8000 
5. ee Wyeville 
© 6000- 
4.000- 
2.000- 


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 
Year 


Figure 9 The evolving populations of Exton and Wyeville 


Clearly there is something special about the relationship between the 
‘steady-state’ population vector x = [x y]’ = [12000 6000)" and the 
transition matrix T for this particular problem. It is easy to see the nature 
of that special relationship by simply working out the matrix product of T 
and x: 

0.9 0.2] |12000} — |0.9 x 12000+ 0.2 x 6000} — |12000 

i aa | 6000 | 7 ie x 12000 + 0.8 x ‘onal — | 6000 | , 


Expressed in symbolic terms, 
Tx =x, 


showing that the action of T on x leaves x unchanged. This obviously 
explains the stability of the steady-state populations, since repeated 
applications of the transition matrix will continue to produce the same 
outcome. This special relationship between x and T is described by saying 
that x is an eigenvector of T. 


In fact, this particular example of an eigenvector is very special indeed. 
Rather more common are cases where A is a square matrix (not 
necessarily a transition matrix) and y is a column vector such that 


Ay =)y, 


where A is a constant scalar. In these cases we would still describe y as an 
eigenvector of A, but we would now say that A is the corresponding 
eigenvalue. In the case of the population model considered above, the 
eigenvalue A is 1, because Tx = 1x. In the next subsection we will consider 
another example of an eigenvector, but this time we will be specifically 
concerned with situations in which the corresponding eigenvalue is not 1. 
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Exercise 8 


Use matrix multiplication to determine which, if any, of the following 
column vectors is an eigenvector of the transition matrix 


0.9 0.2 
-— ie e 


ts Fed (b) 0 (0) a (a) a 


It is clear from this exercise that an eigenvector of a square matrix is not 
unique. In fact, given any eigenvector x of a square matrix, any scaled 
column vector y = kx, where k is a non-zero scalar constant, will also be 
an eigenvector of that square matrix. Moreover, x and y will correspond to 
the same eigenvalue X. (It is important not to confuse the scalar multiple k 
discussed here with the eigenvalue A that we will discuss in the next 
subsection.) 


2.2 Eigenvalues in a transformation of the plane 


In Unit 4 we saw that a 2 x 2 matrix A = [a,j] can be interpreted 
geometrically as representing a linear transformation of the Cartesian 
plane. Such transformations map a point with the Cartesian coordinates 
(xo, yo) to a point with the Cartesian coordinates (21, y1), where 

ZL, = a1170 + a12Y0 and Y1 = 421% + a22U0- In terms of matrices we can 
write this as 


U1) _ |@11 412} | Xo 
Y1 a21 422) | Yo 


Geometrically, we may regard the coordinates that appear in the column 
vector on the right as the components of a position vector, 

Yo = (Xo, Yo) = Loit+ yoj, where i and j are the unit vectors in the z- and 
y-directions, respectively. Thus, in geometric terms, the action of the 
matrix A is to transform or ‘map’ one vector rg into another vector rj. 


With this geometric view in mind, consider the linear transformation 
specified by the matrix 


a 2 
f= i Hl | 
Using matrix multiplication, it is easy to see that this particular 
transformation will map the unit vector i = (1,0) to the vector (3,1), and 
the unit vector j = (0,1) to the vector (2,4). (If this is not clear, you 
should explicitly work out the matrix products Ai and Aj, with i and j 
interpreted as column vectors.) These transformations are illustrated in 


Figure 10, along with the effect of the transformation on a general vector 
r = (x,y) that is mapped to the vector (3x + 2y, x + 4y). 
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Yh y Yt 


j=) 
8 
S 
SY 


0 
(a) (c) (b) 


Figure 10 The action of the linear transformation represented by A on 
(a) the unit vector i, (b) the unit vector j, and (c) the general vector 

r = (z,y). In each case the initial vector is red and the transformed vector 
is blue. 


Note that the action of the linear transformation on a general vector is to 
change its direction and its magnitude. 


Now consider the action of A on the column vector w = [1 1)’: 


swell il =e) 


According to the rules of matrix algebra, we can write this last result as 


5 1 
Aw | 5 H Ow. 
This shows that w is an eigenvector of A, and that it corresponds to an 
eigenvalue of 5. The geometric interpretation of this result is shown in 


Figure 11; the transformation maps an eigenvector w into the scaled 
vector 5w, preserving its direction but altering its magnitude. 


Exercise 9 


Use matrix multiplication to confirm that the linear transformation 
represented by A maps the scaled vector kw, where k is any non-zero 
scalar, into the vector 5kw, and comment on the significance of this result. 


The vector w and its scaled versions kw are not the only eigenvectors 

of A, nor is 5 the only eigenvalue. Using matrix multiplication, it is also 
easy to show that z = [-2 1)" is an eigenvector, and that in this case the 
corresponding eigenvalue is 2: 


Tlene 


This means, of course, that any non-zero scaled vector of the form kz will 
also be an eigenvector of A, corresponding to the eigenvalue 2. This is why 
we describe z = [—2 17 as an eigenvector corresponding to eigenvalue 2, 
rather than the eigenvector corresponding to that eigenvalue. (You might 
like to confirm for yourself that [4 —2]7 and [-1 4] are also 
eigenvectors that correspond to the eigenvalue 2, since they are each of the 
form kz, with k = —1 and k = , respectively.) 


(5, 5) 


(1,1) 
0 2 
Figure 11 A maps its 


eigenvector w into the scaled 
vector 5w 
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The term real matrix means 
that the matrix has real 
elements. 
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The vectors w and z together with their scaled variants comprise all the 
possible eigenvectors of A, and the eigenvalues 2 and 5 are the only 
eigenvalues of A. So the particular transformation that we have been 
examining has two distinct eigenvalues, both of which happen to be 
positive. It should be noted, however, that this is not always the case. 
Eigenvalues may be positive or negative or zero, and real, imaginary or 
complex. As you will see later, a matrix may even have repeated 
eigenvalues, so while each eigenvector corresponds to a single eigenvalue, 
there is no guarantee that each eigenvector corresponds to a different 
eigenvalue. 


Exercise 10 


In each of the following cases, verify that v is an eigenvector of A, and 
write down the corresponding eigenvalue. 


wa Bh 


wa Lh 
waft of 
waft Lh 


The real 2 x 2 matrix i 4 


_ 1 d _ 1 
Vi = img an v2 ieavg|* 
Show that the eigenvalues of the matrix are 1 — 22 and 1+ 27, and 
determine which eigenvalue corresponds to which eigenvector. 


| has the complex eigenvectors 


Sometimes it is possible to deduce information about the eigenvectors and 
eigenvalues of a given matrix from its geometric effects. This is so for each 
of the three cases considered in Figure 12. In each case, the geometric 
action of the matrix is illustrated by its effect on a unit square with 
vertices (0,0), (1,0), (1,1) and (0,1). Also shown is the effect of the 
transformation on the perpendicular unit vectors drawn along two 
adjoining sides of the unit square. 


2 Introducing eigenvalues and eigenvectors 


Matrix Comment Transformation of the unit square Eigenvectors | Eigenvalues 
7] 
A scaling by 3 
E 4 in the z-direction (1 oj? 3 
0 2 and by 2 in the . 
y-direction (i.e. [0 2 
a (3,2) scaling) 
3 ax 
T 
1 0 A reflection in 1 [1 9 1 
0 -1 the x-axis lo. a? —1 
ae A rotation through i | ut < ‘ , 
% | gen ees => rene ae 
v2 v2] | about the origin : e 
0 12 
0 x 


Figure 12 Three matrices representing transformations of the plane, 
together with their real eigenvectors and corresponding eigenvalues 


In the case of the matrix f it is clear from the geometric properties 


0 
of 
of the linear transformation that vectors along the coordinate axes 
i=[1 0]? andj=[0 1)" are eigenvectors, as these are transformed to 
vectors in the same directions. Vector i (and its scalar multiples) has 
eigenvalue 3, and vector j (and its scalar multiples) has eigenvalue 2. 


_ fl ee . 
In the case of the matrix F | , we see that reflection in the z-axis 


leaves the vector represented by i= [1 0]” unchanged, and reverses the 
direction of j = [0 1]”, so these must be eigenvectors corresponding to the 
eigenvalues 1 and —1, respectively. 


In the third case of Figure 12, the matrix describes rotation through 7/4 
anticlockwise about the origin. In this case we would not expect to find 
any real eigenvectors because the direction of every vector is changed by 
the linear transformation. However, even this matrix does have 
eigenvectors and eigenvalues; they simply happen to involve complex 
quantities. This puts them beyond the kind of simple geometric 
interpretation that we are using here, though they can be studied using the 
algebraic methods that will be introduced in Section 3. 
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The restriction to non-zero 
column vectors means that we 
never have to deal with an 
eigenvector of the form 

v= (0. -<« -O 
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Exercise 12 

1 ae . 
The matrix A = f i corresponds to reflection in a line through the 
origin at an angle 7/4 to the z-axis. What are the eigenvectors of A and 
their corresponding eigenvalues? 


(Hint: Find two lines through the origin that are transformed to 
themselves by the reflection, then consider what happens to a point on 
each line.) 


2.3 The eigenvalue equation 


Having seen several examples, we are now in a good position to write down 
some formal definitions and list some of the properties of eigenvectors and 
eigenvalues. We begin with the simple but very important relationship 
often referred to as the eigenvalue equation. 


Eigenvalue equation 


Let A be any square matrix. A non-zero column vector v is an 
eigenvector of A if it satisfies the eigenvalue equation for A: 


Av = Xv _ for some scalar A. (3) 


The scalar 4 is said to be the eigenvalue of A that corresponds to 
the eigenvector v. 


Exercise 13 


Confirm that the transition matrix (considered in Subsection 2.1) 


0.9 0.2 
_ ie i: 


has eigenvectors [2 1]" and [1 —1]", and find the corresponding 
eigenvalues. 


If we count all the scaled versions of an eigenvector as equivalent, it can be 
shown that no n x n matrix may have more than n eigenvectors and 

n eigenvalues. Thus the two eigenvalues of T quoted in Exercise 13 are the 
only eigenvalues of T, and the two given eigenvectors are its only 
eigenvectors (apart from their scalar multiples). 


When dealing with a general n x n matrix, there are several complications 
that can arise when counting the number of eigenvalues and eigenvectors. 
We will say something about those complications in Section 3. For the 
present, however, we avoid them by considering only the simplest and most 
common case, in which an n x n matrix has n distinct eigenvalues and 

n distinct eigenvectors. This is what many would regard as the ‘normal’ 
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case, so it is natural to give it particular attention. Even so, we need to be 
clear about what we mean by ‘distinct’ eigenvalues and ‘distinct’ 
eigenvectors. 


As far as eigenvalues are concerned, the notion of ‘distinct’ is simple: it 
just means different, i.e. unequal. The situation regarding eigenvectors is 
not so straightforward. We have already indicated that to be regarded as 
distinct, an eigenvector must not be a scalar multiple of any other 
eigenvector. However, with future developments in mind, we really need to 
go beyond this to define what we mean by distinct eigenvectors. In fact, 
we require a new concept called linear independence, which is the topic of 
the next subsection. 


2.4 Linear independence 


For the moment, let us forget about eigenvectors and just think about 
vectors in general. Linear independence is a property of sets of vectors. We 
begin with a simple geometric approach, first for two vectors, then for 
three vectors. 


Two (non-zero) vectors v, and vg are said to be linearly dependent if they ba al Vo 
are collinear, i.e. one is parallel or antiparallel to the other. Conversely, v1 v2 / 
and vz are said to be linearly independent if they are not collinear (see 
Figure 13). 

(a) (b) 


Now consider the case of three vectors. Three (non-zero) vectors v1, v2 
and v3 are said to be linearly dependent if they are coplanar, i.e. they all 
lie in the same plane. Conversely, v1, v2 and v3 are said to be linearly 
independent if they are not coplanar (see Figure 14). 


Figure 13 (a) Two linearly 
dependent vectors. (b) Two 


linearly independent vectors. 
z 


V1 Vo 
YD 


V3 7] 


(a) (b) 


Figure 14 (a) A plane (in blue), containing three linearly dependent 
vectors. (b) Three linearly independent vectors in three-dimensional space. 


Considering pairs and triples of vectors is as far as we can go using the 
geometric approach. So let us introduce an algebraic approach that will 
allow us to generalise to any number of vectors. 


As stated, two vectors v1 and v2 are linearly dependent if they are 
collinear, which is equivalent to vj = kv2 for some number k. So they are 
linearly independent if v, #4 kvo for any k. Another way of saying this is: 


v, and v2 are linearly independent if the only solution of the equation This is equivalent to the 
previous statement because if we 
avi + a2v2=0 (where a; and a2 are numbers) (4) could find a solution of (4) with, 
say, a, #0, then vy = kv2 with 


is ay = aq = 0. 
t 2 k = —ag/ay. 
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Now, three vectors v1, v2 and vz are linearly dependent if they are 
coplanar. This is equivalent to saying that one vector can be expressed as 
a linear combination of the other two, say v1 = kava + k3v3 for some 
numbers kz and k3 (for example, in Figure 14(a), the vectors satisfy 

V2 = vi + v3). The vectors are linearly independent if this is not possible. 
Mathematically we state this as follows: v1, v2 and v3 are linearly 


If we could find a solution of independent if the only solution of the equation 
equation (5) with, say, a 0, 
He ane suey ate Yon # ayv, + a2V2 + a2v3 =O (where the a; are numbers) (5) 
i Se ee isa, = a2 =a3 = 0. 
QA, QA, 


You should by now be noticing a pattern emerging, so we move to the 
general case. n vectors v; (i = 1,...,m) are said to be linearly dependent if 
one of the vectors can be expressed as a linear combination of the others, 
e.g. v1 = hove + k3v3 +--+ +knVn for some numbers f,. If it is not 
possible to do this, then we say that the vectors are linearly independent. 
This is equivalent to the following definition. 


Linear independence 


n vectors V1, V2,---,Vn are linearly independent if the only 
solution of the equation 


Q1V1 + Q2QV2+-+++QnVn =O (where the a; are numbers) (6) 


is Qa, 9 a3 sae An 0. 


If the vectors are not linearly independent, then they are said to be 
linearly dependent, and one of the vectors can be expressed as a 
linear combination of the others. 


Example 6 


In two dimensions, are the unit vectors i= [1 O]7 and j= [0 1)” linearly 
dependent or linearly independent? 


Solution 


Clearly, a,i + a2j never gives the zero vector unless aj = a2 = 0. Hence 
the vectors are linearly independent. 


This is also clear geometrically, since the vectors are neither parallel nor 
antiparallel. 


Example 7 


Show that the vectorsi=[1 0 0]?,j=[0 1 O/7 andv=[3 2 Oj” 
are linearly dependent. 
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Solution 

Since v = 31+ 2j, we can write v — 3i— 2] = 0. This is equivalent to 
equation (6), with n = 3, v1 = Vv, v2 =i, v3 = j, ay = 1, ag = —3 and 
a3 = —2. So the vectors are linearly dependent. 


This is also clear geometrically, since all three vectors lie in the ry-plane. 


Exercise 14 


Show that if there is a solution of equation (6) with a; 4 0, then one of 
the vectors can be written as a linear combination of the others. 


Exercise 15 


Are the following sets of three-dimensional column vectors linearly 
independent or linearly dependent? Justify your answers. 


fa) i=[f1 0 0f,j=(0 1 OF k= 0 Af. 
(b) v7 =[1 -1 O]?, ve=[-1 1 Of?,vgs=[0 0 1). 


It is the concept of linear independence that really captures what we mean 
by ‘distinct’ eigenvectors: the eigenvectors of a matrix are said to be 
distinct if they are linearly independent. We return to this in the next 
subsection, but for now, let us continue our exploration of linear 
independence. 


Linear independence and dimension 


Linear independence of vectors is intimately connected with the dimension 
of the space in which they live. In fact, we often use the definition of linear 
independence to define what is meant by the dimension of a vector 
space. (The term vector space is often used to describe spaces of vectors 
in higher dimensions.) It should be fairly obvious that in two dimensions, 
we can have no more than two linearly independent vectors, because if we 
had three, they would all lie in the same plane and hence be linearly 
dependent. So in two dimensions, if we have two linearly independent 
vectors v1 and vg, then we can express any other vector as a linear 
combination of them: v = av; + v2 (because v,, v2 and v are linearly 
dependent); see Figure 15. 


Likewise, in three dimensions we can have no more than three linearly 
independent vectors. And if we have three linearly independent vectors vj, 
v2 and v3, then we can express any other vector as a linear combination of 
them: v = av; + v2 + yv3 (because vj, V2, v3 and v are linearly 
dependent). 


We use these observations to provide a definition of the dimension of a 
vector space. 


Figure 15 In two 
dimensions, any vector v may 
be written as a linear 
combination of any two 
linearly independent vectors 
vi and vo 
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Dimension and basis 


The dimension of a vector space is equal to the maximum number of 
linearly independent vectors that it allows. 


This follows because in In an n-dimensional vector space, if we have n linearly independent 
n dimensions, if V1, V2,---;Vn vectors V1, V2,-..,Vn, then we can express any other vector as a 


are linearly independent and we lineacteouilinationion ters 
add another vector v, then 


V,V1,V2,--+;Vn are linearly AY == ENA AP C2V2 Stes ote CnVn- (7) 
dependent, hence v can be : i . 
expressed as a linear The set of linearly independent vectors {v1, v2,...,Vn} is called a 
combination of the others. basis for the n-dimensional vector space. 

Example 8 


The vectors vi; =[1 3] and v2 =[2 —-1] are linearly independent. 
Express the vector v= [1 1)" as a linear combination of them. 


Solution 


Setting v = av, + Sve, we have 


l=e[s] +L 3] 


from which we get the simultaneous linear equations 


1= a+ 20, 
1=8a—- £8. 
Solving these, we obtain a = 3/7 and 6 = 2/7. 


Exercise 16 


Are the following three-dimensional column vectors linearly independent or 
linearly dependent? 


vi=[l 0 27, we=[1 -§ O', 
v3=(0 4 -1)', va=[0 0 —2]". 
Exercise 17 
The vectors vi =[1 1]? and v2 =[-2 1] are linearly independent. 


Express the vector v = [1 3]’ as a linear combination of them. 


Until now, in two dimensions we have taken the (linearly independent) 
unit Cartesian vectors i and j as the basis, and expressed every other 
vector as a linear combination of them: v = vzi + v,j. Similarly, in three 
dimensions we have taken i, j and k as the basis, and expressed every 
other vector as a linear combination of them: v = vzi+ vyj + zk. 
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However, what we now know is that it is not necessary to use orthogonal 
unit vectors as a basis. In n dimensions, any set of n vectors will suffice as 
a basis, provided that they are all linearly independent. 


Having explored the notion of linear independence, we now return to our 
main topic and apply these ideas (which apply to vectors in general) to the 
eigenvectors of a matrix. 


2.5 Application of linear independence to 
eigenvector expansions 


Recall that given an n x n matrix A, there are a set of numbers X (the 
eigenvalues) and a set of vectors v (the eigenvectors) that satisfy the 
eigenvalue equation 


Av = Vv. 


Furthermore, if v is an eigenvector (with eigenvalue ), then so is the 
scalar multiple kv (for any number k), so v and kv are not regarded as 
distinct eigenvectors. 


In the ‘usual’ case there are n distinct eigenvalues with n distinct 
eigenvectors. ‘Distinct eigenvalues’ means that they are not equal. We are 
now in a position to say what we mean by the term ‘distinct eigenvectors’. 


A matrix has n distinct eigenvectors v1, V2,..-,Vn if they are linearly 
independent. 


Let us check that the 2 x 2 matrix discussed in Subsection 2.2 does indeed 
have only two linearly independent eigenvectors. 


Example 9 


In Subsection 2.2 we showed that the matrix A = fi i has eigenvectors 


14 
of the form [k_ k]”, corresponding to the eigenvalue 5, and eigenvectors of 
the form [—2k k]”, corresponding to the eigenvalue 2. Answer the 
following questions regarding these eigenvectors, justifying each of your 
answers. 


(a) Are the eigenvectors [1 1)” and [2 2] linearly dependent? 


(These correspond to the first eigenvector, with k = 1 and k = 2, 
respectively. ) 


(b) Are the eigenvectors [-2 1]? and [2 —1]" linearly dependent? 


(These correspond to the second eigenvector, with k = 1 and k = —-1, 
respectively. ) 


(c) Are the eigenvectors [1 1]? and [-2 1)” linearly dependent? 
(These correspond to the first and second eigenvectors, both with 
p=) 


111 


Unit 5 Linear algebra 


This is a powerful statement, 
but its proof, although not 
difficult, is beyond the scope of 
this module. 
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Solution 
(a) Yes, since 
211 17-2 4 =[0 oF, 
so the eigenvectors fail the test for linear independence, and are 


therefore linearly dependent. This is also clear geometrically, since the 
vectors are collinear. 


(b) Yes, since 
[-2 17-p -17=0 gf, 
so the eigenvectors fail the test for linear independence, and are 


therefore linearly dependent. This is also clear geometrically, since the 
vectors are collinear. 


(c) No. For the vectors to be linearly dependent, we would need to be able 
to find non-zero scalars a, and a2 such that 


vs fi] +e [7] = [a 

For this to be possible, we require 
Qa, — 2a2 = 0, 
ay+ ag=0. 


However, the only solution of this pair of equations is ay = 0 and 
a2 = 0, so the two eigenvectors corresponding to different eigenvalues 
are linearly independent (and therefore distinct). 


a2 
1 4 
eigenvectors of the forms [kk]? and [-2k_ kj", there are only two 
linearly independent eigenvectors. We are free to choose any value of k. 
Choosing k = 1, we say that A has two linearly independent eigenvectors 


This example demonstrates that although A = | has an infinity of 


vi =[1 1]? and v2 =[-2 1)”. In fact, since one only ever talks about 
linearly independent eigenvectors, a statement such as this is often 
abbreviated to ‘A has two eigenvectors, v; = [1 1)" and vo = [-2 1]. 


The obvious question is: when does an n x n matrix have n linearly 
independent eigenvectors? It turns out that if the eigenvalues are distinct, 
then so are their corresponding eigenvectors. 


If an n x n matrix has n distinct eigenvalues, then it has n linearly 
independent eigenvectors. 


In fact, in many (but not all) cases, even if the eigenvalues are not all 
distinct, we can still have n linearly independent eigenvectors. 
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Exercise 18 


The matrix 


1 -2 4 
A= |-2 7 —10 
-1 4 -6 
has distinct eigenvalues Ay = —1, A2 = 1, A3 = 2, and corresponding 


eigenvectors v} =[-1 1 1]7,vg=[1 2 lJ’ andv,=(0 2 1)” (up 
to a multiplicative constant). Show that the eigenvectors are linearly 
independent. 


Eigenvector expansions 


We have already seen that in an n-dimensional vector space, any set of 
n linearly independent vectors vj, Vv2,...,Vn can be used as a basis, i.e. 
any vector can be expressed as a linear combination of the v; as in 
equation (7). 


This means that in the ‘normal’ case that we are considering, when an 
n xX nm matrix A has n linearly independent eigenvectors, we can use these 
as the basis vectors. This is called an eigenvector expansion. 


Eigenvector expansion 


If A is an n X n matrix with n linearly independent eigenvectors 


V1,V2,---;Vn, and v is any n-dimensional vector, then there exist 
scalars C1, C2,..-,€n such that 
V =CyVy + CoVg +--+ + CpVn- (8) 


This is called the eigenvector expansion of v. 


Example 10 


We saw in Exercise 13 that in the population model of Subsection 2.1, the 
transition matrix 


0.9 0.2 
Ss ie ia 


has eigenvectors [2 1] and [1 —1]" that correspond to the distinct 
eigenvalues 1 and 0.7. Use this information to determine an eigenvector 
expansion of the two-dimensional column vector x9 = [10000 8000)” that 
represents the initial populations of Exton and Wyeville. 


Solution 


The eigenvectors correspond to different eigenvalues, so are linearly 
independent. This is also obvious because they are not collinear. 
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They therefore form a basis for two-dimensional vectors, so we can write 
a 10.000 - 2 ns 1} — |2e1 + ce 
°™ | 3000 | ~ tftp 7? ]-1]) 7 fare] 
Equating corresponding elements of the column vectors on the right and 
the left shows that 
2c; + cg = 10000 and cy — co = 8000. 


Solving this simple system of linear equations, we see that cy = 6000 and 
co = —2000. Thus 


Xo = 6000 | — 2000 |_| ; 


Why is an eigenvector expansion of this kind particularly useful? The 
answer lies in the very simple effect that a matrix has on its own 
eigenvectors. To take a two-dimensional example, let A; and A2 be the 
eigenvalues corresponding to the linearly independent eigenvectors v1 
and v2, so Av; = A;v; fori = 1,2. It then follows from the eigenvector 
expansion of any vector v that 


Av = A(cyvj + c2V2) = CA1Vv1 + C2A2V2. 


The value of this kind of simplification will be made clear in the next 
subsection. However, before that you can apply it for yourself in the 
following exercise. 


Exercise 19 


We saw in Subsection 2.2 that the transformation of the plane represented 
by the matrix 


3.2 
aff 
has eigenvectors [1 1] and [-2 1)” that correspond to the distinct 


eigenvalues 5 and 2. 


(a) Use this information to determine an eigenvector expansion of the 
position column vector rg = [—2 4]”. 


(b) Use the eigenvector expansion to calculate Arp without using matrix 
multiplication. 


2.6 Convergence towards an eigenvector 


We can use what we have just learned about eigenvector expansions to 
provide insight into the behaviour of the population model of 
Subsection 2.1. 
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You will recall that the annual changes in population of Exton and 
Wyeville were represented by the action of a transition matrix T that 
transformed the populations at the beginning of year n into those at the 
beginning of year n+ 1. We already know, from Example 10, that the 
initial populations of Exton and Wyeville can be written as 
FE 000 
xo = 


8000 | = 6000v, — 2000v2, 


where v; = [2 1]? and v2=[1 —1]" are the eigenvectors corresponding 
to the eigenvalues \y = 1 and Ag = 0.7 of the matrix T. 


Now consider the effect of repeatedly applying the transition matrix T to 
the initial population vector x9 = 6000v; — 2000v2. Remembering that 
Tv, = A vq and Tv2 = eva, a single application of T to xg gives 
x, = Txp = T(6000v, — 2000v2) 
= 6000T Vv, — 2000T v2 
= 6000A1 v1 — 2000A2v2. 
A second application of T gives 
x2 = Tx; = T(6000A, v1 — 2000A2v2) = 6000A7v1 — 2000A3vo. 
A third application of T gives 
x3 = Tx. = T(6000A?v1 — 2000\2v2) = 6000A$v1 — 2000A3vo, 
and after k applications, 
x, = Txp_1 = 6000\¥ v1 — 2000\F v2. 


Now, A; = 1, so rv is also equal to 1. However, Az = 0.7, so ae = 0.49, 
d3 = 0.343 and 3° = 0.000 0225 (to three significant figures). As we 
repeatedly apply T to xo, we will find that the contribution of v2 will 
become smaller and smaller as k increases, so we have 


12 000 


XE > 6000v1 = | 6000 


| for large k. 


This is exactly what happened in the population model, where we found 
that the populations approached 12000 in Exton and 6000 in Wyeville. 


Suppose that we start with some other initial population x9. Because we 
can always write xg = cyv1 + c2V2, repeated application of T will give 
xX, = T (ev, + c2V2) 

— Fd Mis a + coT* vo 

= cy Arvy + CoA Ve 

~c,v1 for large k. 
More generally, suppose that we have an arbitrary n x n matrix A anda 
vector x. What will be the result of repeated application of A to x, ie. 


what is A*x? If A has n linearly independent eigenvectors v;, then we can 
always write x as an eigenvector expansion 


X= CyVy + CoVg +++: + CnVn- 
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Let us assume that we have chosen our initial vector x so that no c; is 
zero. Then since Av; = A;v;, where A; is the eigenvalue corresponding to 
eigenvector v;, we have 
A*y; = Av;. 
Hence 
A*x = A®(eyv4 + cove +++++CnVn) 
=pA'vy + mA ve +s+s + e,Avy 
= crv, + coARVe free ft GO Vine 
As k gets larger and larger, the eigenvalue with largest modulus will 
dominate. So if Ap is the eigenvalue with largest modulus, we have 


Afx~ Gert vp for large k. 


So we see that for (almost) any initial vector x, for large k, A’x is 
proportional to the eigenvector with largest modulus eigenvalue. And since 
we ignore any scaling of eigenvectors, we can say that A’x is equal to the 
eigenvector with largest modulus eigenvalue. This observation is the basis 
for many numerical algorithms for finding eigenvectors of very large 
matrices. 


Exercise 20 

Use A, ro and the eigenvector expansion of Exercise 19 to determine (to 
two significant figures) rg = A’ro, r9 = A®ro and rig = Alrp. Comment 
on your results. 


Google’s PageRank algorithm: the world’s largest 
eigenvector problem 


Before Google, web search engines were a hit and miss affair, often 
returning masses of irrelevant links. The advent of Google, in the late 
1990s, changed all that. Apart from its phenomenal speed, it seemed 
to deliver ‘the best’ pages available to a given search. The principal 
reason for this is its PageRank algorithm (see Figure 16), named after 
its co-inventor Larry Page (Figure 17), who also co-founded Google. 


ee, 


PageRank™ 


Figure 16 An illustration of how the PageRank algorithm works: 
the size of each face is proportional to the total size of the other faces 


Figure 17 Larry Page, that are pointing to it 
co-inventor of the PageRank 
algorithm 
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The PageRank algorithm quantitatively assesses the ‘importance’ of 
each page on the web by the number of other ‘important’ web pages 
that link to it. One way to get a feel for how it works is to imagine 
playing a (rather dull) game. Start on any web page and click on any 
link at random. From the new web page, again click on any link at 
random. Keep doing this, very many times, randomly going from web 
page to web page via the links. The PageRank of a particular web 
page is the proportion of time spent visiting that web page. 


Mathematically, this is modelled as follows. First label every page on 
the web by an integer 7 (the order is not important). Then construct 
a huge matrix A, where every row and column represents a page on 
the web. (A is currently approximately of order 101° x 101°, 
corresponding to the ~ 10!° current web pages.) Roughly speaking, 
each element Aj; is set equal to the probability of going from web 
page 7 to web page j by randomly clicking on a link; e.g. A5,23 is the 
probability of visiting web page 23 by randomly clicking on a link on 
web page 5. Now consider a vector, each element of which represents 
the probability of being on a web page. Starting the game on web 
page 7 is represented by the vector xo, all of whose elements are zero 
except for the ith element, which is unity. The vector x; = Axo gives 
the probability of being on any web page after one click. The vector 
Xp, = A”xo gives the probability of being on any web page after 

n clicks. The situation is very similar to the Exton—Wyeville 
population model, and the final PageRank is given by the eigenvector 
of A with largest eigenvalue, i.e. x, for very large n. The web page 
with largest PageRank corresponds to the largest element of x,. 


3 Finding eigenvalues and eigenvectors 


‘You have now seen several examples of eigenvectors and eigenvalues, and 
some of their uses. You have also been told a little about finding 
eigenvectors and eigenvalues through iteration. The time has now come for 
a more detailed investigation of the determination of eigenvalues and 
eigenvectors. In the next subsection we introduce the important notion of 
the characteristic equation of a matrix. This is a polynomial equation, the 
roots of which are the eigenvalues of the matrix. Once the eigenvalues have 
been found, methods based on row operations can be used to construct the 
related eigenvectors. We will examine some examples based on 2 x 2 and 

3 x 3 matrices, but you should be aware that many of the ideas are very 
general, and that modern applied mathematics makes extensive use of 
computer packages to find the eigenvectors and eigenvalues of matrices, 
often based on the row operations and iterative methods that have already 
been mentioned. 
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3.1 The characteristic equation 


An eigenvector v of an n x n matrix A satisfies Av = Av. By introducing 
an n x n identity matrix I, this can be written as Av = AlIv, i.e. 


(A —Al)v =0. (9) 


Clearly v = 0 is a solution of this equation — called the trivial solution. 
We are not interested in this because, by definition, v = 0 is not an 
eigenvector. For there to be non-trivial solutions v, the n x n square 
matrix on the left, (A — AI), known as the characteristic matrix of A, 
must be non-invertible. (If it were invertible, the unique solution would be 
v = (A — AI)~!0 = 0, which is trivial.) For the characteristic matrix to be 
non-invertible, it must be singular, i.e. its determinant must be zero: 
det(A — AI) = 0. Expanding the determinant gives a polynomial equation 
of degree n satisfied by every eigenvalue of A. 


For example, in the case of a 2 x 2 matrix A = [a;;] and an eigenvector 
v=[v, ve)", equation (9) is 


ai, a2 = d 1 0 Ui} 0 
a21 22 0 1 vo! JO’ 
which can also be written as 
ut — r ai2 U1) _- 0 
a21 a22 — Xr v2 _ 0} * 
So for a non-trivial solution, we require 


aji—-A ay 


= 0. 
ag, a2 — A 


The determinant here gives a quadratic equation for , whose two 
solutions are the eigenvalues of A. 


Characteristic equation 
Let A be any square matrix. The equation 
det(A — AI) =0 (10) 


is called the characteristic equation of A. Its roots, i.e. the values 
of A that satisfy the characteristic equation, are the eigenvalues of A. 


Example 11 


Write out the characteristic equation of the transition matrix introduced in 
the Exton—Wyeville population model, 


0.9 0.2 
i ie ie : 


and use it to determine the two eigenvalues that arise in that model. 


3 Finding eigenvalues and eigenvectors 


Solution 
In this case, the characteristic equation is given by 


09-r O2 |_ 
(4. OS —h)— 


0. 
Expanding the determinant gives 

(0.9 — A)(0.8 — A) — 0.1 x 0.2 = 0, 
which can be rewritten as 


MH 157A 407 =0, 


This is a quadratic equation in that may be either solved by the 
standard formula or factorised to give 


(A — 1.0)(A — 0.7) =0. 


By either approach, it is clear that there are only two roots, A; = 1.0 and 
Az = 0.7. You learned earlier that these are the two eigenvalues of T. 


Exercise 21 


Write out the characteristic equation of the matrix 
3 2 
ae 


and hence find the eigenvalues of A. 


3.2 Eigenvalues of 2 x 2 matrices 


The following procedure, based on the characteristic equation, can be used 
to find the two eigenvalues of any 2 x 2 matrix. 


Procedure 2 Finding eigenvalues of a 2 x 2 matrix 


a bd 


Let A = 2 | . To find the eigenvalues of A, do the following. 


d 
1. Write down the characteristic equation det(A — AI) = 0. 
2. Expand this as 


a-—2X b 
G d—xX 


[=2? = (at a)at (ad ~ be) =0. ay) 


3. Solve this quadratic equation to find the two values of A, which 
are the required eigenvalues. 
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Exercise 22 


Calculate the eigenvalues of the following matrices. 


@e@=[ 3] mH=[7 3 


Exercise 23 


Calculate the eigenvalues of the following two-dimensional rotation and 
scaling matrices. 


cos@ —sin@ l O 
(a) R= fee me (b) M= k 


For a 2 x 2 matrix we can always solve the quadratic equation that results 
from the characteristic equation and hence determine the two eigenvalues. 
Nonetheless, it is worth investigating the 2 x 2 case a little further because 
of the light that it can shed on a number of problems. 


As a first step we introduce the quantity known as the trace of a matrix. 


Trace of a matrix 


Given any n x n matrix A = [aj], the trace of that matrix is 
denoted tr A and is the sum of the elements on its leading diagonal: 


tr A = a1) + @02 +--+: +@nn. (12) 


It follows from this definition that in the case of a general 2 x 2 matrix 
A= : , its trace is given by tr A = a+ d. However, for such a matrix 
the determinant is det A = ad — bc. It therefore follows from equation (11) 
that the expanded form of the characteristic equation of a 2 x 2 matrix can 
be written as follows. 


Characteristic equation of a 2 x 2 matrix 
 —trAA+det A =0. (13) 


Applying the generic formula for finding the roots of a quadratic equation, 
we see that the two eigenvalues are given by 


A= 45(trA+ VD), where D = (tr A)? — 4det A. (14) 


The quantity D is known as the discriminant. For a 2 x 2 matrix 


d 
D=(atd?—Aud—be) = (a =a)" + the. 


A= : it the discriminant is 


3 Finding eigenvalues and eigenvectors 


Hence the two eigenvalues can be expressed as follows. 


Eigenvalues of a 2 x 2 matrix 


N= 5(at+d+ V/(a—d)? + 4be). (15) 


For a real 2 x 2 matrix we thus have three cases: 
e a pair of distinct real eigenvalues if D > 0 

e repeated real eigenvalues if D = 0 

e apair of complex eigenvalues if D < 0. 


The first and third of these cases correspond to the normal situation with 
two distinct eigenvalues. Note, however, that when D < 0, we have 

A= 35(trA+/-|D = 3 (tr A+ iv/|D|), so the eigenvalues are complex, 
with one eigenvalue being the complex conjugate of the other. (You saw 
instances of this in Exercises 22 and 23.) The second case, when D = 0, 
means that the characteristic equation has a repeated root. 


Exercise 24 
Show that the characteristic equation of the matrix S = 4 4 , where s is 


a number, has a repeated root, and determine what it is. 


There are two useful results that follow directly from equation (14). First, 
note that adding the two eigenvalues gives 


Ay t Ag = trA. 
Second, note that multiplying the eigenvalues gives 
MAg = (tr A+ VD)3(trA— VD) 
= 4 ((tr A)? — D) 
= det A. 


Although we have derived these results for a 2 x 2 matrix, they both turn 
out to be generally true for square matrices of any order. The two general 
results are as follows. 


Trace and determinant rules for eigenvalues 
For any n X n matrix: 
e the trace is equal to the sum of all its eigenvalues 


e the determinant is equal to the product of all its eigenvalues. 


These results are frequently used to assist and check matrix calculations. 
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Exercise 25 
Given that A; = 26.115 and A» = —0.115 are the eigenvalues of the matrix 
11 12 
= ir | 


confirm that they pass the trace and determinant checks (working to two 
decimal places). 


3.3 Eigenvalues of larger matrices 


Formulating and solving the characteristic equation of some 3 x 3 matrices 
is not too daunting, but beyond that the task becomes manually 
challenging. With the wide availability of computer packages, there is a 
tendency to turn rapidly to a machine that will either perform the 
necessary algebra or provide numerical estimates of the eigenvalues. 


Example 12 
Write down the characteristic equation of the matrix 
3-1 -1l 
A=|+1 3 1), 
—1 1 9) 


and given that one of its roots is 6, find the other two roots. Confirm that 
the sum of the roots gives the trace of A, and the product of the roots is 
the determinant of A. 


Solution 
In this case the characteristic equation of A (equation (10)) is given by 


a 1 
det(A—AI)=| -1 3-A 1 |=O0. 
Zl t oa 


Using Laplace’s rule (see Unit 4) to expand the determinant in terms of 
the elements of the top row gives 


(3 — A)[(B — A)(5— A) — 1] — (-1)[-(5 — A) +1] — 1-1+ 3 —-A)] =9, 
which may be rewritten as 


d? — 11)? + 364 — 36 = 0. (16) 


Knowing that one of the roots of this equation is \ = 6, we can extract a 
factor (A — 6) to obtain 


A=6)0l=—5A+ti=0. 


(Factorising an equation like this is best done in bits. First, knowing that 
= 6 is a root, we write (\ — 6)(aA? + b\ +c) for some constants a, b, c. 
Then comparing with equation (16) immediately gives a = 1 and c= 6. 
Finally, expanding (\ — 6)(A? + b\ + 6) and comparing with equation (16) 
gives b = —5.) 
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Then, either by using the usual formula to factorise the quadratic function, 
or factorising by inspection, we can write 


0-30-3080. 


This shows that the three eigenvalues are 2, 3 and 6. 


Adding the three eigenvalues gives 2+ 3 + 6 = 11, which is also the sum of 
the leading diagonal elements, tr A = 3+ 3+45. This confirms that the 
trace of A is equal to the sum of its eigenvalues. 


Similarly, the product of the eigenvalues is 2 x 3 x 6 = 36, while using 
Laplace’s rule to expand det A gives 


det A = 3(15 — 1) — (-1)(—5 +1) — 1(-1+3) = 42-4-2=36. 


This confirms that the determinant of A is equal to the product of its 
eigenvalues. 


Exercise 26 


Find the eigenvalues of the following matrices, given that in each case one 
of the eigenvalues is 2. 


1 -2 4 4 7 6 
(a) A=|-2 7 -10 (b) A=| 6 5 6 
—1 4 —6 —8 -—10 —10 

Exercise 27 


Check that your answers to Exercise 26 satisfy the trace and determinant 
rules for eigenvalues. 


3.4 General results on eigenvalues 


In this subsection we list some general results regarding the eigenvalues of 
certain types of n x n matrices. We mainly prove the results for 2 x 2 
matrices, but you should note that they are valid in general. 


We already know the trace and determinant rules, which are valid for any 
square matrix. Now let us consider some rules that apply to special types 
of matrices. 


Real matrices 


Recall that a real matrix is one whose elements are all real. In 
Subsection 3.2, we proved that for a real 2 x 2 matrix, if the eigenvalues 
are complex, then they occur in complex conjugate pairs; i.e. for a real 
matrix, if \ is an eigenvalue, then so is \. 


It is trivial to prove this for any real n x n matrix. If A is a real matrix 
and A is a complex eigenvalue with corresponding eigenvector v, then —__ 
Av = Av. So taking the complex conjugate of both sides, we get Av = Av, 
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type of triangular matrix. 
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which is equivalent to 


Av = dV. 
But A is real, so A = A, and we have 
AV = dV. 


Hence J is also an eigenvalue of A, with corresponding eigenvector Vv. 


Exercise 28 


The matrix 
1 01 
A= 0 1 0 
—-1 0 1 


has one eigenvalue A; = 1 +7. Determine the other two eigenvalues 
without solving the characteristic equation. 


Triangular matrices 


A matrix is triangular if all the entries above (or below) the leading 
diagonal are 0, e.g. 


p) | @ b ¢ o) |@ 0 0 a OO 
oa [2 @ e]> [oo G]e[e © of. foe 0 
00 f : def 00 ¢ 

(upper triangular) (lower triangular) (diagonal) 


a 


0 ae . 
- H , the characteristic equation 


For a 2 x 2 triangular matrix k | or | 
is 

(a— A)(d— X) =0, 
hence the eigenvalues are \ = a and A = d. Thus the eigenvalues of a 
triangular matrix are the diagonal entries. This is true for any n x n 
triangular matrix, and the proof is very similar to that given above. 


Exercise 29 


What are the eigenvalues of the matrix F 4 ? 


Real symmetric matrices 


A matrix is symmetric if it is equal to its transpose, i.e. A = A? — that 
is, the entries are symmetric about the leading diagonal, e.g. 


b ade 
k |. db f 
" e f c 


3 Finding eigenvalues and eigenvectors 


For a real 2 x 2 symmetric matrix A = ; il the characteristic equation 
is (see equation (11)) 

NM — (a+ d)A + (ad —b7) = 0, 
hence the eigenvalues are 


A=4(atdt V(a+ a)? —Aad—P)). 


The term under the square root (i.e. the discriminant) is 
(a4+d)? —4(ad— b*) = (a? + 2ad +0) — 4ad + 467 = (@— d)* +40’, 


which is the sum of two squares and therefore cannot be negative. It 


follows that the eigenvalues of a real symmetric matrix are real. Note that if a matrix has real 
; . . eigenvalues, it is not necessarily 
There is a well-known proof of this result for n x n real symmetric true that it is a real symmetric 


matrices, which we include in Subsection 4.2, but you are not required to matrix. 
know it. 


Exercise 30 


: : ; b 
Under what circumstances can a symmetric matrix k have a repeated 


eigenvalue? 


Non-invertible matrices 


A non-invertible matrix has determinant equal to 0 (see Unit 4). However, 
from the determinant rule, if 47, A2,..., Ay are the eigenvalues of an n x n 
non-invertible matrix A, then 


AyAQ...An = det A = 0. 
It follows that a matrix is non-invertible if and only if at least one of its 


eigenvalues is 0. Also, a matrix is invertible if and only if all its eigenvalues 
are non-zero. 


We summarise what we have learned. 


Properties of eigenvalues — summary 
e The product of the eigenvalues of A is det A. 
e The sum of the eigenvalues of A is tr A. 


e The complex eigenvalues and corresponding eigenvectors of a real 
matrix occur in complex conjugate pairs. 


e The eigenvalues of a triangular matrix are the diagonal entries. 
e The eigenvalues of a real symmetric matrix are real. 


e A matrix is non-invertible if and only if at least one of its 
eigenvalues is 0. 
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Exercise 31 


Without solving the characteristic equation, what can you say about the 
eigenvalues of each of the following matrices? 


@a=[% 2] wa-[% @] oa a 


We now turn to eigenvectors. 


3.5 The eigenvector equation 


An eigenvector v of a general n x n real matrix A satisfies Av = Av. As 
you saw in Subsection 3.1 (equation (9)), by introducing an n x n identity 
matrix I, this condition can be written as the following eigenvector 
equation. 


Eigenvector equation 
(A — Al)v =0. (17) 


Earlier we were interested in the determinant of the left-hand side that led 
to the characteristic equation and the eigenvalues. Now we need to find 
the non-zero solutions of the linear equations themselves. This is most 
easily described in the context of 2 x 2 matrices, as in the next subsection. 


3.6 Ejigenvectors of 2 x 2 matrices 


Consider a 2 x 2 matrix 
a b 
ee k H | 
Suppose that we have solved the characteristic equation det(A — AI) = 0 
to find the eigenvalues of this matrix. Now let us try to find the 


eigenvectors. Suppose that the unknown eigenvector v = [x y]” has a 
known eigenvalue \. Then the eigenvector equation 


(A — AI)v =0 


gives 


Gan D oe) 10 
c  d—A2A} ly} JO}? 
which is equivalent to the equations 


(a — A)x + by = 0, (18) 
cx + (d—A)y = 0. (19) 


Since we know a,b, c,d and 4, this is a pair of simultaneous linear 
equations that can be solved for x and y. We did this in Section 1. 
However, there is a subtlety: because det(A — AI) = 0, the system of 
equations is singular. Such systems of linear equations were discussed in 


3 Finding eigenvalues and eigenvectors 


Subsection 1.4, where we discovered that they can have either no solution 
or an infinity of solutions. In fact, it turns out that the two equations (18) 
and (19) are really the same equation written in two different-looking ways. 
So we need to solve only one of them, as the other gives no information, 
thus we obtain an infinity of solutions (see Example 5(c) for a similar case). 


We should not be surprised by this, because the eigenvector equation 
determines the eigenvectors only up to an arbitrary scalar multiple. So 
when we solve equation (18) or (19) for z and y, we obtain a solution of 
the form v =k[x  y]", for an arbitrary (non-zero) scalar k. 


If we want, we can assign an arbitrary value to k, such as k = 1. It should 
then be understood that any (non-zero) scalar multiple is also an 
eigenvector. However we choose to handle it, having found the eigenvector 
corresponding to A, our next step should be to use the other eigenvalue of 
A to find a second eigenvector. 


Here is a numerical example of what we have just been describing. 


Example 13 
Find the eigenvalues and corresponding eigenvectors of 
1 4 
ee f | | 
Solution 


The characteristic equation is 
1-2 4 
1) —2-— x 


Expanding gives (1 — ’)(—2 — A) — 4 = 0, which simplifies to 
d\? + \—6=0. So the eigenvalues are \ = 2 and A = —3. 


= 0. 


Let v=[z y]” be an eigenvector. Then the eigenvector equation 
(A — AI)v = 0 is 


Pr ot l= bl 
1 —2—Al ly 0}? 
which is equivalent to the system 
(l-—A)a + Ay=0, 
z+ (-2-—A)y=0. 
Now consider these for each of the eigenvalues in turn. 
e =6For A = 2, the eigenvector equations become 
—x+4y=0 and x«-—4y=0. 


Clearly these equations are the same, so we have the single equation 
4y = x. This single equation shows that if = k, then y = k/4. So the 
eigenvector corresponding to \ = 2 has the general form [k_ k/4]”. 
Choosing k = 4 for convenience gives [4 1]” as an eigenvector 
corresponding to eigenvalue \ = 2. As usual, any non-zero scalar 
multiple is also an eigenvector. 


To prove that the two equations 
are the same, use the fact that 


det(A — AT) 
= (a—A)(d— ) — be = 0. 


Note that even though the two 


equations provide the same 


information, both are examined 


since this is a useful check. 
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e For A= —3, the eigenvector equations become 


4x+4y=0 and x+y=0, 


which are equivalent to the single equation y = —ax. This single 
equation tells us that if = k, then y = —k. So the eigenvector 
corresponding to 4 = —3 has the general form [k —k]"; choosing 
k=1 gives [1 —1]?. 
Of course, it is always good practice to check that we have correctly 
determined our eigenvalues and eigenvectors by checking that they satisfy 
Av = \v explicitly. 


Here, as a summary, is the general procedure. 


Procedure 3 Finding the eigenvectors of a 2 x 2 matrix 


@ 
Let A= |" 2 


eigenvalue A, do the following. 


| . To find an eigenvector corresponding to the 


1. Write down the eigenvector equations 
(a—A)a+ by = 0, (20) 
cx + (d— r)y = 0. (21) 
2. These equations reduce to a single equation that is readily solved 
for x and y. The eigenvector is given by v= [2 y]’, with x and 


y replaced by their solved values. Any non-zero scalar multiple is 
also an eigenvector. 


3. Repeat for the second eigenvalue to find the second eigenvector. 


Exercise 32 


Find the eigenvalues and eigenvectors of the following matrices. 


a Fr a as fi | 


Exercise 33 


Find the eigenvectors of 


cos@ —sin@ 
a ee | 


when @ is not an integer multiple of 7. 


tid 


(Hint: The eigenvalues cos #@ + isin @ = e+” were found in Exercise 23.) 


3 Finding eigenvalues and eigenvectors 


3.7 Ejigenvectors of larger matrices 


In the general case, if A is an n x n matrix, there will be n eigenvalues. 
For each eigenvalue \, the eigenvector equation (A — AI)v = 0 will provide 
n simultaneous equations, and we must solve these to determine the 

n components of the corresponding eigenvector. As before, because 

det(A — AI) = 0, the system of n equations is singular with an infinity of 
solutions, corresponding to the fact that any (non-zero) scalar multiple of 
an eigenvector is also an eigenvector. That means that in at least one case 
(and perhaps more), we will not be able to determine a unique value for a 
component of v, and we will have to represent that component by an 
arbitrary non-zero scalar k. The other components can then be expressed 
in terms of that value k. 


Often, the best way of solving the system of eigenvector equations is to use 
Gaussian elimination, as described in Section 1. The procedure is to 
express the eigenvector equations in terms of an augmented matrix, then 
use row operations to reduce the coefficient matrix to upper triangular 
form. The fact that the system is singular will result in at least one row of 
zeros in the augmented matrix, indicating that the corresponding unknown 
can be assigned the arbitrary value k. This row can be made the bottom 
row of the augmented matrix. All the other components can then be 
expressed in terms of k, using back substitution. 


Here is a numerical example. 


Example 14 
Find the three eigenvectors of 
1 -2 4 
A=|-2 7 -—10], 
-1 4 -6 
given that the eigenvalues are —1, 1 and 2. 
Solution 
The eigenvector equation for the given matrix, for v = [x, a2 23)", may 
be written 
1-A —2 4 Ly 0 
2 7-xX 10 to) = 10 
—1 4 —6—Al |23 0 
This is equivalent to the system 
(1 _ A)a1 _ 202 + 473 = 0, 
224 (7 A)x2 10x3 = 0, 
—%1 + 4ro + (—6 _ A) x3 = 0. 


129 


Unit 5 Linear algebra 


130 


For A = —1, the augmented matrix is 


2 —2 4 | 0 Ri 
—2 8 —-10/ 0 Re. 
-1 4 -5 | 0 R3 


Reducing the elements below the leading diagonal in column 1 to zero: 


2-2 4/0] Ri 
R2+ Ry, 0 6 —6 | O Roa . 
R3+3Ri | 0 3 -3 | 0] Raa 


Reducing the element below the leading diagonal in column 2 to zero: 


2-2 41|0 
0 6 -6]0 
Rza—3Ro | 0 0 01] 0 


The final row of zeros (a result of the expected singular system) allows 
us to assign 23 the value k, which is arbitrary apart from the 
requirement that the resulting eigenvector should be non-zero. Back 
substitution then tells us that 6x2 = 6k, so x2 = k, and back 
substituting again gives 2x, — 2k + 4k = 0, so 7; = —k. Thus the 
general form of the eigenvector is v = k[-1 1 1)", where k is an 
arbitrary non-zero value. 


For A = 1, the augmented matrix is 


0 -2 4 | 0 
—2 6 —10 /} 0 
-1 4 -7]|0 


In this case it will be helpful to interchange rows before doing 
anything else, so the starting arrangement will be 


—-1 4 -7]|0 Ri 
0 -2 4 | 0 Ro. 
—2 6 —10 /} 0 Rs 


Completing the reduction of the elements below the leading diagonal 
in column 1 to zero: 
—l 4 —7 | 0 R, 
0 -2 41/0 Ro 
R3 — 2Ri 0 -2 44/0 Raa 


Reducing the element below the leading diagonal in column 2 to zero: 
-1 4 -7 10 
0 -2 4/0 
Raa — Ro 0 0 0 0 


The final row of zeros allows us to assign +3 the arbitrary non-zero 
value k. Back substitution then tells us that x2 = 2k, and back 
substituting again gives 7; = k. Thus the general form of the 
eigenvector is v= k[l 2 1)", where k is an arbitrary non-zero value. 


3 Finding eigenvalues and eigenvectors 


e =For A = 2, the augmented matrix is 


—1 =-2 4 | 0 Ri 
—2 5 —10 } 0 Ro. 
-1 4 -8 | 0 Rs 


Reducing the elements below the leading diagonal in column 1 to zero: 


— 2 Ao) Ry 
Ry — 2R, O° O 18 10) Rees 
Re—- Ri 0 6 -12 10 | Raa 


Reducing the element below the leading diagonal in column 2 to zero: 
—1 -2 4 | 0 
0 9 -18 | 0 
Rasa — $Roa 0 0 oOj0 


The final row of zeros allows us to assign x3 the arbitrary non-zero 
value k. Back substitution then tells us that rg = 2k, and back 
substituting again gives x; = 0. Thus the general form of the 
eigenvector is v = k[0 2 1)", where k is an arbitrary non-zero value. 


Here, as a summary, is the general procedure for finding eigenvectors. 


Procedure 4 Finding eigenvectors in general 


Let A be an n x n matrix. To find the eigenvectors of A, first solve 
its characteristic equation, to find the eigenvalues. Then for each 
eigenvalue X, solve the eigenvector equation (A — AI)v = 0 to find v. 


If A is not a repeated eigenvalue, this will determine an eigenvector up 
to an arbitrary scalar multiple. 


Exercise 34 


Find the eigenvalues and corresponding eigenvectors of the following 
matrices. 


2 1 -l 0 2 0 
(a) }O -3 2 (b) |-2 0 0 

0 0 4 0 0 1 
Exercise 35 


Find the three eigenvectors of 


4 7 6 
A=|6 5 6, 
=f 10 “10 


given that the eigenvalues are —2, —1 and 2. 


When eigenvalues are repeated, 
there can be a number of 
subtleties, which we do not cover 
in this module. 
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We say that the eigenvectors can 
be chosen to be real because 
having found an eigenvector, we 
can always choose to multiply it 
by an imaginary or complex 
number if we wish, and it will 
still be an eigenvector. 


Yr 


Vi v2 


z 
Figure 18 Ejigenvectors of a 
real symmetric matrix 
corresponding to distinct 
(real) eigenvalues 
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3.8 Ejigenvectors of real symmetric matrices 


In Subsection 3.4 we defined a symmetric matrix as one that is equal to its 
own transpose. There we showed that the eigenvalues of a real symmetric 
matrix are real. Real symmetric matrices have several practical 
applications. For example, the matrix that describes the system of pipes 
and taps given in the Introduction is a real symmetric matrix. In this 
subsection we investigate the eigenvectors of real symmetric matrices. 


Given that the eigenvalues of a real symmetric matrix are necessarily real, 
it follows that the eigenvectors of such a matrix can always be chosen to be 
real. This is because the eigenvector equation that can be used to 
determine the eigenvectors contains only real elements, and the process of 
solving that equation will not introduce any imaginary quantities. 
However, reality is not the only interesting issue concerning the 
eigenvectors of real symmetric matrices. 


In Exercise 32(b) we showed that the real symmetric matrix A = fi i 
has the (real) eigenvalues A; = 1 and Az = 3, and corresponding 


eigenvectors vi = [1 —1]? and v2=[1 1)”. 


These eigenvectors are represented graphically in Figure 18 by the 
conventional (geometric) vectors v; = i-—j and v2 =i+j. 


As you can see, these two vectors are at right angles. Thus their scalar 
product is zero: vj + vg =i-i—j-j=O. 


In fact, we do not have to write the eigenvectors in terms of i and j to see 
this, since we can use matrix algebra to define the scalar product. Recall 
that (in two dimensions) the scalar product of two vectors v = [vy v,]" 
and w =[w, wy]! in component form is v -w = v,w, + vew2. However, 
using matrix algebra we see that the scalar product is simply 


T Wi) 
vw [v1 vg] en = V1tW1 + V2W2, 


and the vectors are perpendicular if v’ w = 0. 


So for the two eigenvectors v; = [1 —1]? and vo =[1 1)’, the scalar 
product is 


vive=ft -1)|7] =a +10) =0, 


and they are indeed perpendicular. 


Clearly, this definition of the scalar product works for row vectors of any 
dimension. But in dimensions higher than 3, we tend to call the scalar 
product the inner product, and if the inner product of two vectors 
vanishes, we tend to say that they are orthogonal rather than 
perpendicular. (This was mentioned in Subsection 2.1 of Unit 4.) 
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Inner products and orthogonality 


Given any two column matrices v and w, their inner product is given 
by v’ w. 


If vw = 0, then we say that v and w are orthogonal. 


F é 2 1 
Returning to the case of the real symmetric matrix A = ; 9 the 
orthogonality of the eigenvectors v1 and v2 is not a coincidence. It turns 
out that for any real symmetric matrix, eigenvectors that correspond to 
distinct eigenvalues are always orthogonal. A proof of this is given in 


Subsection 4.2, but you are not required to know it. 


When eigenvalues are not distinct, it turns out that there are extra 
parameters in the eigenvectors, other than the scalar multiple, and we can 
always choose values for these parameters to make the eigenvectors 
orthogonal. This gives us the following powerful statement about 
eigenvectors of symmetric matrices. 


For any real symmetric n X n matrix A, it is always possible to find a 
set of n real orthogonal eigenvectors of A. 


We will use this result in Unit 7 to help us to classify the stationary points 


of a function of two variables. 


Exercise 36 


Find the inner product of each of the following pairs of column vectors, 
and hence determine which, if any, are orthogonal. 


(a) s}=[2 1)? andsy=([3 —6]" 
(b) ty = (2 2 APP endty = [So =2- 0/7 


Exercise 37 


Calculate the eigenvectors of A = ; i , and show that they are 


orthogonal. 


Let us summarise our results for real symmetric matrices. 


Eigenvalues and eigenvectors of a real symmetric matrix 
For a real symmetric matrix: 
e the eigenvalues are real 


e the eigenvectors may be chosen to be real and orthogonal. 


For example, the 2 x 2 real 
symmetric matrix A =I has two 
unit eigenvalues, and the 
eigenvectors are of the form 

[x y]* for any x and any y. So 
we simply choose 7 = 1, y = 0 
for v; and x = 0, y = 1 for vo, 
giving v; = [1 0]? and 

v2> [0 i. 
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Figure 19 Paul Dirac 
(1902-1984), one of the 
founders of quantum 
mechanics 
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Symmetric matrices — a complex perspective 


Complex matrices are used in many applications, and are particularly 
important in quantum physics, which was developed in the 1920s by 
many people, including Paul Dirac (Figure 19). There, the closest 
analogue to a real symmetric matrix is known as a Hermitian matric, 
which is defined by the requirement that the matrix should equal the 
complex conjugate of its own transpose (i.e. (A7)* = A, where the 
star indicates complex conjugation of all the elements). Clearly, if a 
Hermitian matrix has only real elements, then it is a symmetric 
matrix. Hermitian matrices also have real eigenvalues. Indeed, the 
proof (given in Subsection 4.2) that the eigenvalues of a real 
symmetric matrix are real is directly applicable to the case of 
Hermitian matrices. Nor does the similarity end there. Ann x n 
Hermitian matrix also possesses n orthogonal eigenvectors, though 
they will generally be complex. Hermitian matrices, together with 
their eigenvalues and eigenvectors, are of fundamental importance in 
quantum physics, where they describe the dynamics of matter at the 
atomic and subatomic scale. 


4 Further results and proofs regarding 
real symmetric matrices 


The material in this section is optional, and will not be assessed. 


Subsection 4.1 extends our analysis of symmetric matrices, showing how 
one can use the orthogonality of eigenvectors to construct a simple formula 
for the eigenvector expansion. Subsection 4.2 contains the proofs that the 
eigenvalues of a real symmetric matrix are real and the eigenvectors are 
orthogonal. 


4.1 Real orthonormal bases 


In Subsection 2.4 we introduced the basis of a vector space as a 
generalisation of the idea of Cartesian unit vectors. However, Cartesian 
unit vectors are especially easy to work with because they are mutually 
orthogonal (implying i-j =j-k=i-k=0), and normalised (implying 
i-i=j-j=k-k=1). These properties are not shared by the basis of a 
general vector space. 


However, we have now seen that an n x n real symmetric matrix has n real 
eigenvectors V1, V2,---,Vn that may be chosen to be mutually orthogonal 
in the sense that viv; = 0. Moreover, each of those eigenvectors can be 


4 Further results and proofs regarding real symmetric matrices 


normalised in a way that is directly analogous to the way in which we form 
normalised Cartesian unit vectors (by dividing each vector by its own 
magnitude). In the case of a real eigenvector represented by a column 
vector v;, we define normalisation as follows. 


Normalisation 


Given a real eigenvector v;, the corresponding normalised 
eigenvector V; is given by (see Subsection 1.4 of Unit 4) 


as Vi Vi 
Vi 


(22) 


TM Vs: 


and it satisfies the normalisation condition WF; = il, 


Here, the magnitude of v has been defined as |v| = \/v/ vi. 


Exercise 38 


2 
2 9 has orthogonal 
eigenvectors v; = [2 1]? and v2 =[1 —2]?. Write down the 
corresponding normalised eigenvectors. 


In Exercise 37 you showed that the matrix A = > 


A set of vectors in which each element is normalised as well as being 
orthogonal to all the other vectors belonging to the set, is said to be 
orthonormal; e.g. i, j and k are orthonormal. We therefore arrive at the 
following important conclusion regarding the eigenvectors of a real 
symmetric matrix. 


Given any n x n real symmetric matrix, it is always possible to find a 


set of n real orthonormal eigenvectors V1, V2,..., Vn for which 
ee ei TION Ge (Cy a 


Such a set is linearly independent, and therefore forms an 
orthonormal basis for the n-dimensional vector space. 


It follows from this that for any n-dimensional column vector v, we have 
the eigenvector expansion 


V=ay Vi taeve +--+ + AnVn- (23) 


We have seen eigenvector expansions before, but in those earlier cases the 
eigenvectors concerned were just linearly independent, not orthonormal. So 
working out the scalars a1, Q@2,..., Qn, was not easy and was largely avoided 
except for n = 2. However, orthonormality makes things much simpler. If 
we matrix multiply each side of equation (23) on the left by v7, we get 


AT ATX ATX ATX 
Vi V = Q1Vj Vi + A2QVj V2 +++ + AnVj Vn- 
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Then, using the orthogonality of the normalised eigenvectors, all of the 
inner products on the right-hand side (i.e. all the matrix products of the 
form vi ¥;) must vanish, apart from the one for which 7 = 7. Thus 


T: 


A ata 


We can now use the fact that each of the orthonormal eigenvectors has 


magnitude 1 (i.e. ¥/'¥; = 1) to write 
Viv= a; 
This equation gives all the coefficients a; for i = 1,2,...,n. Hence 


equation (23) can be rewritten as follows. 


Real orthonormal eigenvector expansion 


v = (Vi v)¥, + (WE v)¥o +--+: + (Wi v)¥a. (24) 


This is a very useful expression that enables us to represent a vector as an 
eigenvector expansion of a real symmetric matrix. 


Of course, all of this is expressed in the rather general language of linear 
algebra, but it should be clear that the real orthonormal eigenvectors V; 
are nothing more than generalisations of the familiar Cartesian unit 
vectors i, j and k, which are also orthonormal. And equation (24) is simply 
a generalisation of v = vi + vyj + vzk, where vz =i-v, vy =j-v and 

Uz =k-v. 


Here is a numerical application of equation (24). 


Example 15 


The real symmetric matrix A = F A has eigenvectors v; = [1 —1]" 


and v2 = [1 1], corresponding to the eigenvalues \y = 1 and 2 = 3. 
(a) Find the normalised orthogonal eigenvectors V; and V2. 


(b) Express the vector v = [3 —2]” as an eigenvector expansion using the 
orthonormal basis consisting of V; and Vo. 


Solution 


(a) Since vz; = [1 —1]7 andv2=[1 1)” are already orthogonal, we only 
need to normalise them: 


ala) *- eal 
== Pn ee . 
vivi v2 _ Vi Vo v2 . 


4 Further results and proofs regarding real symmetric matrices 


(b) From equation (24), v = (vi v)¥1 + (v4-v)¥o. Calculating the 
coefficients, we have 


se 1 5 a 
viv=—[l -] B =— and viv= 


Hence the eigenvector expansion is 
Dik dh 
v= V1 + Vv. 


v2 V2 


We can easily check this by substituting the values for v; and vo: 


Serieiceel 


as required. 


Exercise 39 


In Exercise 38 you showed that v1 = Be 1]? and ¥2 = all —2]" are 


2 
- | Express the vector 


orthonormal eigenvectors of the matrix A = ; 9 


v=[1 1)” in terms of this basis. 


4.2 Proofs concerning real symmetric matrices 


Proof that for real symmetric matrices the eigenvalues are 
real 


Throughout the proof, stars are used to indicate complex conjugation. So 
given a complex quantity z = a+ 2b, its complex conjugate is 

z* = (a+ib)* = (a— ib). Note that 2*z = a? + b? = |z|?. The use of a star 
to indicate complex conjugation is a very common alternative to the 
overline that is used elsewhere in the module. 


Let A be a real symmetric matrix, and let » be one of its eigenvalues that 
corresponds to an eigenvector v. In such a situation, 


Av =v. (25) 
We know that the elements of A are real, but at this stage we cannot be 
sure that either \ or v is necessarily real. Taking the transpose of both The rule for taking the 
sides of equation (25) gives transpose of a product of 


Ginna ip matrices was discussed in Unit 4. 
vi A’ =)v°. 


Taking the complex conjugate of both sides then gives 
yay = alae 

But A is both real and symmetric, so (A7)* = A, giving 
(v7)*A = A*(v?)*. 
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Matrix multiplying each side of this equation on the right by v then gives 
(yi Aw = (07 )*w. 

On the other hand, matrix multiplying each side of equation (25) on the 

left by (v7)* gives 


(v7 )* Av = Xv" )*v. 


Subtracting these last two equations gives 


0 = (A— A*)(v")*v. (26) 
However, if v= [vj v2 ... vn)”, then 
(v7? )*v = vtuy + vkug +--+ + uxun = |or|? + |vg|? +--+ Jonl?, 


which must be a positive quantity and non-zero because eigenvectors are 
always non-zero. Since (v/)*v is positive, it follows from equation (26) 
that 


0=A-%, 
from which we see that \ = A*, so \ must be real. 
Proof that for real symmetric matrices the eigenvectors 
corresponding to distinct eigenvalues are orthogonal 


Given that Av; = A;v; and Av; = A;v;, taking the transpose of the 
equation involving v; gives 


vi AT = ve, 

and matrix multiplying each side of this on the right by v; gives 
vp A’v; = Nv) vj. 

Since A is symmetric, so that A? = A, this may be rewritten as 
v} Av; = NV} V;- 


On the other hand, starting from the equation Av; = A;v;, and matrix 
multiplying each side on the left by v7, we see that 


vi Av; = Ajve vj. 
Subtracting the last two equations, we find 
0= (i = \j)vi vj. (27) 


However, A; and A; are distinct, so A; — A; # 0. It therefore follows from 
equation (27) that v/v; = 0, which is just the condition for the real 
eigenvectors v; and v; to be orthogonal. 


Learning outcomes 


Learning outcomes 


After studying this unit, you should be able to do the following. 


Solve 2 x 2 and 3 x 3 systems of linear equations by the Gaussian 
elimination method, using row interchanges where necessary. 


Determine whether such a system of equations has no solution, a 
unique solution, or an infinity of solutions. 


Explain the meaning of the terms eigenvector, eigenvalue, linearly 
independent and basis. 


Understand how repeated application of a matrix to a vector becomes 
proportional to the eigenvector with largest modulus eigenvalue. 


Use the characteristic equation and eigenvector equation to calculate 
the eigenvalues and eigenvectors of a 2 x 2 matrix. 


Calculate the eigenvalues and eigenvectors of a 3 x 3 matrix, where 
one of the eigenvalues is obvious or given. 


Appreciate that an n x n matrix with n distinct eigenvalues gives rise 
to n linearly independent eigenvectors that can be used as a basis. 


Know how to calculate an eigenvector expansion of a given vector. 


Appreciate that the eigenvalues of a matrix may be real or complex, 
and may be distinct or repeated. 


Appreciate that the complex eigenvalues and corresponding 
eigenvectors of real matrices occur in complex conjugate pairs. 


Write down the eigenvalues of a triangular matrix. 


Recall that the sum of the eigenvalues of a matrix A is tr A, and that 
the product is det A, and use these properties to help to check and 
determine eigenvalues. 


Know that a real symmetric matrix has real eigenvalues and that the 
eigenvectors can be chosen to be real and orthogonal. 
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Solutions to exercises 


Solution to Exercise 1 


Yes, the Cartesian equation of a plane is linear in x, y and z. This can be 
seen by comparing the Cartesian equation with the general linear equation 
for n = 3 and making the identifications aj = a, 71] = 4%, dg = b, x2 = y, 

a3 =C, 73 =z andd=0. 


Solution to Exercise 2 


(a) The required matrix form is 


i! LL ley 2 
5 2 2} | x2} = | 20 
4 -—2 —3] | 23 15 


(b) The equations are just a messy rearrangement of those in part (a), so 
after further rearrangement to return them to the form given in 
part (a), we get the same answer as before. 


(c) Remembering to insert zero coefficients in place of missing terms, the 
required matrix form is 


2 3 —4] |x 0 
2 0 3/ ly} = {3 
0 6 —2| Jz 0 


Solution to Exercise 3 


The augmented matrix is 


1 L 1 2 Ri 
5 2 2 | 20 Ro. 
4 -—2 -3 | 15 Rs 


First, we reduce the elements below the leading diagonal in column 1 to 
Zero: 
1 1 -1 2 R, 
R2—-—5R,; | 0 —-3 7 | 10 Roa - 
R3;—4R,; | 0 -6 1 7 R3a 
Then we reduce the element below the leading diagonal in column 2 to 
Zero: 


1 1 -l 2 
0 —3 t 10 
Rga — 2Rea 0 O —13 | —18 


The equations represented by the new matrix are 


Zit t- «£3=2, 
— 322+ 7x3 = 10, 
— 1323 = -18, 


which give v3 = 1, x2 = (7x3 — 10) =—l and 2; =2-—22+ 23 = 4. 


We verify the solution as follows: 


1 1 -l 4 4—1-1 2 
5 2 2) J-1}) = }20—2+4+2] = | 20 
A =2) =3 1 164+2-—3 15 


Solution to Exercise 4 


Let the salaries (in thousands of rurs) of managers, software engineers and 
clerks at a bank be x;, where 7 = 1, 2,3, respectively. The problem is then 
specified by the simultaneous equations 
3x1 + 2% + 24273 = 137, 
t+ Lo + 26x23 = 137, 
329 + 2523 = 1387. 


This problem is unaffected by exchanging the order of the first two 
equations, to obtain the augmented matrix 


1 1 26 | 137] Ry 
& 9. 241 437 | Bo; 
0 3 25 | 137 | Rg 


from which we obtain 


1 1 26 137 Ry 
Rz — 3R, 0 —-1 —54 | —274 Raa 
O 3 25 137 Rga 


without introducing fractional elements. We complete the elimination 
stage as follows: 


1 1 26 137 
O -1 —54 | —274 
Rzgat3R2, | 0 O —137 | —685 


From this we see that +3 = 5 for clerks, and back substitution gives 

tq = 274 — 5423 = 4 for software engineers, and x71; = 137 — xq — 2673 = 3 
for managers. Thus the numbers required are given by the vector 

x =[3 4 5], which can be confirmed as a solution of the original 
matrix equation. 


So in Ruritania, clerks receive 5000 rurs per month, software engineers 
receive 4000 rurs per month, and managers receive 3000 rurs per month. 


Solutions to exercises 


141 


Unit 5 Linear algebra 


142 


Solution to Exercise 5 
Starting with 


001/2]R 
241-0: | 5 | Re, 
1 20) F | Rs 


we eliminate x; from the second row, to obtain 


0 Ol 2 Ri 
Rz — 2R3 0 -3 0 | -9 Raa . 
1 2 0 7 Rg 


Then R, gives 73 = 2, Roa gives ro = 3, and R3 gives 71 = 7 — 2% = 1. 
Hence the solution isx =[1 3 2]”. (As usual, it should be confirmed 
that the original matrix equation is satisfied by this solution.) 


Solution to Exercise 6 


Starting with 


ED pat) 
112 1/12) R 
13 2 1 | 16.) Rs’ 
4111/22] Ry, 


we eliminate x; from the last three rows, to obtain 


2 1 1 1 ID |, 


Re=<Ri 1/0 0 1 2 2 | Roa 
Re-R: |0 8 © 0D 6 | Resa’ 
Ra 4Ry | 0 <8 =3 —3 1 16 | Ba, 


We could interchange rows to obtain an upper triangular augmented 
matrix, but it should also be clear that R2, gives 73 = 2, R3, gives 

rq = 3, Rua gives 14 = 6 — 2 — 43 = 1, and R, gives 

x, = 10-22 — 23 — x4 = 4. (As usual, it should be confirmed that the 
original matrix equation is satisfied by the solutionx = [4 3 2 1]”.) 
Solution to Exercise 7 

(a) From the augmented matrix 


1 —2 5) 7 Ry 


1 18 -—31 | 40 Rz 


we obtain 


Rez —- Ri 0 5 —-9 |} 18 Raa 
Rz3 — Ri 0 20 —36 | 33 Rsa 


and 
1 =2. 5 @ 
0 5 —9 13 
Rza-4Re | 0 0 0 | -19 | 


The third equation reads 0 = —19, which is impossible. So the 
equations are inconsistent and there is no solution. 


From the augmented matrix 


1-2 5/6)]R 
to <4) 7) Re, 
& 6 49° )419 |) RB 


we obtain 
229 5 6 R, 
Re —- Ri 0 5 -9 | 1 Roa 
R; — 2R, | 0 10 —22 | 0 | R3a 
and 


1-2 #5 6 
0 5 -9 oa | 
Rga — 2Roaa 0 O —4 | -2 


then back substitution gives a unique solution. 
From the augmented matrix 


1-4 1] 14] R, 
— a a Re. 
6 14 -6 | -52 | Rs 


we obtain 


1 —4 1 14] R, 
R.—5R, | 0 19 -6 | —68 | Roa 
Ry—6R, | 0 38 —12 | —136 | Re, 
and 
i—_& i 4 
0 19 —6 | —68 |. 
Rza—2R2, | 0 O O 0 | 


The third equation reads 0 = 0, which is true but does not provide 
any limitation on the possible value of x3. So there is an infinity of 
solutions. In this case we may assign «3 any real non-zero value that 
we choose, say x3 = p. Then from the second equation we get 

t2Q = 75 (—68 + 6p), and from the first equation we get 

v, = 144 422-243 = 75 (-6 + 5p). 


The existence of an infinity of solutions is indicated by the infinity of 
possible choices for p. 
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Solution to Exercise 8 


Matrix multiplication shows that the column vectors given in (b) and (d) 
are both eigenvectors (corresponding to the eigenvalue 1), but those given 
in (a) and (c) are not. The results of the matrix multiplications in the four 
cases are as follows. 


0.9 0.2] [1000] _ [1000 
(2) Jo1 0.8] |1000] = | 900 

0.9 0.2] [120 120 
(b) ie a Fal ~ al 
(0) 0.9 0.2] [500] _ [510 
“) 10.1 0.8} |300} — |290 

0.9 0.2] [20 20 
a a *f Fe ~ a 
Solution to Exercise 9 


The scaled vector kw is represented by the column vector 
v=kw=[k kj’, so 


3.2) /k 5k 
aval allel ba] -™ 
So A does indeed map kw to 5kw. This shows that any scaled vector 
v = kw is an eigenvector of A corresponding to the eigenvalue 5. 


Solution to Exercise 10 
2: 3 


© B ab}=[s]=+p), 


so [3 2]” is an eigenvector with eigenvalue 4. 


» BAL -Aeotd) 


1 —1]? is an eigenvector with eigenvalue —1. 


© fF 3] fol [15] =2[) 


so [0 6]” is an eigenvector with eigenvalue 2. 


@ [tal [a]=(c]-o[ 4]. 


so [1 —2]” is an eigenvector with eigenvalue 0. 
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LAs [se)-e+2 fy 


So 1 + 22 is an eigenvalue and corresponds to eigenvector vj. 


3-2 Hl 1-21 ; 1 
: Yl ied ~ yaa =e fee 
So 1 — 22 is an eigenvalue and corresponds to eigenvector vo. 


Solution to Exercise 12 


The eigenvectors act along the line of reflection y = x and perpendicular 

to it, so they are the scalar multiples of {1 1]7 and [1 —1]?. The vector 
[1 1]” is scaled by a factor of 1 by the transformation, while for [1 —1]" 
the scale factor is —1; these scale factors are the corresponding eigenvalues. 


We may check our conclusion by evaluating 


AL] = il 


1 1)” corresponds to the eigenvalue 1, and 


1 La)= [a] = La} 


so [1 —1]7 corresponds to the eigenvalue —1. 


sO 


Solution to Exercise 13 


Explicit matrix multiplication shows that 


for oa| [i] = [i], 


o [2 1)” is an eigenvector with eigenvalue 1. 


Similarly, 
0.9 0.2 1 1 
for os] La] =" Li); 
so [1 —1]” is an eigenvector with eigenvalue 0.7. 


Solution to Exercise 14 


If a; # 0, then rearranging equation (6) gives 
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Solution to Exercise 15 
(a) i, j and k are certainly linear independent, since they are not coplanar. 


(b) These vectors are linearly dependent. The first vector is —1 times the 
second vector, so those two vectors are antiparallel, and the three 
vectors are coplanar. Also, the system fails the test for linear 
independence since 


Q1V1 + A2V2 + A3V3 = O 
is satisfied, for example, by the values a, = 1, ag = —1, a3 = 0, which 
are not all zero. 
Solution to Exercise 16 


These vectors are linearly dependent, as there cannot be more than three 
linearly independent vectors in a three-dimensional space. 


In fact, the vectors are related as follows: v4 = —$v1 + $v2 + $v3. 


Solution to Exercise 17 


Setting v = av; + Gv2, we have 


|e) +e[ 4) 

from which we get the simultaneous linear equations 
l=a-— 28, 
3=ad+ 8. 

Solving these, we obtain a = 7/3 and 8 = 2/3. 


Solution to Exercise 18 


From the definition of linear independence (see equation (6)), we need to 
show that the only solution of ajv, + agv2 + a3v3 = 0 is 
y= Ag =a3g => 0. 


We have 
—1 1 0 0 
ay 1] +a }2] +a3 |2] = |0 
1 1 1 0 


This gives a system of three equations for a1, a2,a3, which we put in 
augmented matrix form: 


-1 10)]0 Ri 
12 210 Ro. 
1 11/10 R3 


We now solve these equations by Gaussian elimination: 


af -t © ” | R, 
R.+ Ri 0 3 2 0 | Raa , 


Rz3 + Ri 0 2 110 Ra 


-1 1 0]|0 
0:3 20 
R3a — ZRoa 0 0 -% | 0 


Back substitution gives a, = a2 = a3 = 0, hence the eigenvectors are 
linearly independent. 
Solution to Exercise 19 


(a) The eigenvectors correspond to different eigenvalues, so are distinct. 
This is also obvious because they are not collinear. They therefore 
form a basis for two-dimensional vectors, so we can write 


— f-2] 1 —2| — |e, — 2c9 
v= [Afeahil+e[a=[oae] 
Equating corresponding elements of the column vectors on the right 
and the left shows that 


cy — 2c =—-2 and cy +o=4. 


Solving this simple system of linear equations, we see that c; = 2 and 
co = 2. Thus 


val] 2h 


(b) Using the eigenvector expansion gives 


stds hl +2Ea]) 2b} 4{a]= La) 


Solution to Exercise 20 


From the solution to Exercise 19, we have 
—2 
ro = i = 2v1 + 2vo, 


where vz; =[1 1]7 and v2 =[-2 1]? are the eigenvectors of A that 
correspond to the real eigenvalues A; = 5 and Ag = 2. Hence 
rg = Abr = 2A®y, + 2A>v>5 
= 2r8v1 + 2rsv2 


= 2085) ]1] +20 [7a] = [Far 769] 


Thus to two significant figures, 
78 x 104 ea 
sia Ee x io! ae | H 


Similarly, 


_ [39 x 10° ox | _ [20 x 10° 10, |1 
=e Ee x io eae) H a BE x io" ed Vale 
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The significance of these results is that, working to two significant figures, 
the contribution of the eigenvector corresponding to the smaller of the two 
eigenvalues is negligible in the eighth, ninth and tenth iterations. So to 
two significant figures, A’x ~ OAR, for k > 8. 


Solution to Exercise 21 


With A = : i we obtain 
1 4 
3-—AX 2 
det(A.— AN) =| 1 12 )=@-G-)-2, 


so the characteristic equation may be written as \* — 7\ + 10 = 0, the 
roots of which, i.e. the eigenvalues A; = 2 and Ag = 5, may be found using 
the formula or factorising. 


Solution to Exercise 22 
(a) The characteristic equation of G gives 


inn 2 


; 97 [= @-A@-r)-2=0 


This becomes \? — 5\ +4 = 0. Solving this using the standard 
formula, we get 
\ — Dt v25— 16 
= a ce 


So the eigenvalues of G are 1 and 4. 


(b) The characteristic equation of H gives 


es 2 


a 97 [= @-N@-r)+2=0 


This becomes \” — 4\ + 6 = 0. Solving this using the standard 
formula, we get 


44/162 44 /-8 


A 


2 2 
So the eigenvalues of H are 2 + in/2 and 2— ivV2. 
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Solution to Exercise 23 
(a) The characteristic equation of R gives 
d? — (cos 6 + cos @)\ + (cos? 6 + sin? 0) = 0. 
Using the fact that cos? @ + sin? 6 = 1, we get 
dN — 2\cosd+1=0. 
Solving this using the standard formula, we get 


fe 2cos@+ V4 cos? 6 — 4 


2 
= cos 6 + V/cos?6—1 
= cos6+ V —sin? 0 — cosOtisin#é = eri? 


Note that we could have obtained this solution much more quickly by 
working directly from the characteristic equation: 


cos@—-A —sind | | a ee 

ar ae = (cos@— A)“ + sin” 0 = 0. 
This gives (cos @ — \)? = — sin? 0, so taking the square root of both 
sides gives (cos @ — A) = tisin 6, hence \ = cos@+isin@. Using 
Euler’s theorem, this can also be expressed as \ = e*?9. 

(b) The characteristic equation of M is (J — A)(k — A) = 0, hence the 
eigenvalues are / and k. 


Solution to Exercise 24 
The characteristic equation of S is 


1-A s 
0 1-—X 


| = (1 _ d)? _ 0, 
hence we have a pair of repeated eigenvalues \ = 1. 


Solution to Exercise 25 

trA=114+15=26 and detA=11x15-—14x 12=-3. 
To the required level of accuracy, 

Ai + Ag = 26.00 =trA and A Ag = —3.00 = det A. 


The given values therefore satisfy the trace and determinant checks for 
eigenvalues of A. 
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Solution to Exercise 26 
(a) In this case the characteristic equation of A is given by 
1-—~A —2 4 
detiA—Mj=| <2 f= =f0/=0, 
—1 4 —6-—A 


Using Laplace’s rule to expand the determinant in terms of the 
elements of the top row gives 


(1 — A)[(7 — A)(—6 — A) + 40] — (—2)[—2(—6 — A) — 10] 
+ 4[—-8 + (7 — A)] = 0. 
This gives 
N27 -A+2=0. 
Given that one of the factors is 2, we factorise this by writing 
(A — 2)(a\? +bA +c) =0, 
for some constants a,b,c. Expanding this, we get 


an? &(6— 2) {e= 2)k— 9e=0. 


Comparing with equation (28), we see that a= 1,b=0 andc= 
So equation (28) can be factorised to give 


A= =1) =O=7O=DAS 1) =o; 
hence the eigenvalues are —1, 1 and 2. 
(b) In this case the characteristic equation of A is given by 


Ach 6 
det(A-AI)=| 6 5-A 6 |=0. 
Ss —10 iG) 


Using Laplace’s rule to expand the determinant in terms of the 
elements of the top row gives 


(4 — A)[(5 — A)(—10 — A) + 60] — 7[6(—10 — A) + 48] 
+ 6[—60 + 8(5 — A)] =0, 
which becomes 
N+? —4)-4=0. 
Given that one of the roots is 2, we factorise this to give 
(A — 2)(A? + 3A +2) =0 
or 
(A — 2)(A + 2)(A +1) =0, 


hence the eigenvalues are —2, —1 and 2. 


(28) 


Solution to Exercise 27 
(a) For the matrix in Exercise 26(a), tr A = 1+ 7—6 = 2. Comparing 


this with the sum of the eigenvalues A, + Aj + A3 = —14+1+2=2, 
we see that they are equal. 


We could calculate det A by hand. But instead let us note that from 
the equations leading up to equation (28), we have 


det A= =O 9 Ao) 
Setting \ = 0 then gives det A = —2. Comparing this with the 
product of the eigenvalues A; x Az x A3 = —1 «x 1 x 2 = —2, we see 
that they are equal. 


(b) For the matrix in Exercise 26(b), comparing tr A = 4+5-—10=-1 
with A, + A2 + A3 = —2—1+4+2 = -1, we see that they are equal. 


Also, from equation (29) and the equations leading up to it, 

det(A — AI) = —(A? + A? — 4) — 4), so setting A = 0 we get 

det A = 4. Comparing this with A, x Ag x A3 = —2 x -1 x 2=4, we 
see that they are equal. 


Solution to Exercise 28 


Since A is a real matrix, another eigenvalue must be the complex 
conjugate of Ay, i.e. Ag = Ay = 1-7. 


Further, the trace rule Ay + A2 + A3 = tr A gives 
(1+4)+ (1 —4) +A3 =3, 


hence A3 = 1. 


Solution to Exercise 29 


Because the matrix is triangular, the eigenvalues are 1 and 2. 


Solution to Exercise 30 


For the eigenvalue to be repeated, we require \/(a + d)? — 4(ad — b?) = 0, 


ie. (a — d)? + 4b? = 0. This is true only if a =d and b = 0, so the only 
symmetric 2 x 2 matrices with a repeated eigenvalue are of the form 

a O 

0 al 


Solution to Exercise 31 


(a) The eigenvalues are real, since A is real and symmetric. One is 
positive and the other negative, since \yA2 = det A < 0. Also, 
Ay + Ag = tr A= 50: 

(b) The eigenvalues are the diagonal entries 67 and —17, since A is 


triangular. 


(c) The eigenvalues are real, since A is real and symmetric. In fact, A is 
non-invertible, since det A = 0. Thus one eigenvalue is 0. Hence the 
other is 306, since 0+ Ag = tr A = 306. 


Solutions to exercises 
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Solution to Exercise 32 
(a) The characteristic equation is 


8—A —5 
| io 73) 
Expanding this gives (8 — A)(—7 — A) + 50 = 0, which simplifies to 
d\? — \—6=0. So the eigenvalues are \ = 3 and \ = —2. 


Let v=[x y]” be an eigenvector. 
e For \ = 3, the eigenvector equations (20) and (21) become 
5a—5y=O0 and 10x -—10y =0, 


which reduce to the single equation y = x. So (setting « = 1) an 
eigenvector corresponding to \= 3 is [1 1)”. 


e For \ = —2, the eigenvector equations become 
10z2—5y=O0 and 10”7—5y=0, 


which reduce to the single equation y = 2x. So (setting x = 1) an 
eigenvector corresponding to \ = —2 is [1 2]”. 


(b) The characteristic equation is 


is il 


1 5 (=o 


Expanding this gives (2 — \)? — 1 = 0, which simplifies to 

d\? —4\ +3 =0. So the eigenvalues are \ = 3 and \= 1. 

Let v=[x_ yj” be an eigenvector. 

e For \ = 3, the eigenvector equations (20) and (21) become 
=e-+y=U and 2—y=0, 


which reduce to the single equation y = x. So (setting « = 1) an 
eigenvector corresponding to \ = 3 is [1 1)”. 


e For A = 1, the eigenvector equations become 
zr+y=0 and xr+y=0, 
which reduce to the single equation y = —x. So (setting « = 1) an 
eigenvector corresponding to \ = —1 is [1 —1]?. 
Solution to Exercise 33 
The eigenvector equations are 
(cos 6 — A)ax — (sin #)y = 0, 
(sin #)x + (cos 6 — r)y = 0. 
e For A\=cosé+7sin9@, the eigenvector equations become 
—(isiné)x —(siné)y=0 and (sin@)x — (isin@)y = 0, 


which reduce to the single equation iy = x (since sin # 0 as @ is not 
an integer multiple of 7). So setting x = i, a corresponding 
eigenvector is [i 1]7. 


Solutions to exercises 


(Had we set « = 1, the eigenvector would be [1 —i]”, which is 
equally valid as it is just the first eigenvector multiplied by —7.) 


e For A\=cosé—isin9, the eigenvector equations become 
(isin @)x — (siné)y=0 and (sin@)x + (isin#)y = 0, 
which reduce to the single equation —iy = x (since sind # 0), soa 
corresponding eigenvector is [—i 1]? or any scalar multiple. 
Solution to Exercise 34 


(a) The matrix is upper triangular, so the eigenvalues are 2, —3 and 4. 


The eigenvector equations for v =[r1 22 23|" are 
(2—A)ay + ta — «3 =0, 
(—3 — A)xo + Jt =O, 
(4— A)x3 = 0. 


e For A = 2, the eigenvector equations become 
Lg — 3 = 0, —52%9 + 223 = 0, 223 = 0, 


which reduce to rg = +3 = 0. If we assign x; = k, for arbitrary 
non-zero k, then a corresponding eigenvector is k[1 0 OJ”. 
Choosing k= 1 givesv=[1 0 OJ’. 


e For A= —38, the eigenvector equations become 
52, +%2-%3=0, 2%3=0, Txr3 = 0, 


which reduce to 521 + x2 = 0 and x3 = 0. So assigning x; = k, for 
arbitrary non-zero k, we get a corresponding eigenvector 
k[l —5 O0j|*. Choosing k=1 givesv=[1 —5 O]?. 


e For A = 4, the eigenvector equations become 
—271 + X92 — 23 = 0, —7x_9 + 2x3 = 0, 0 = 0. 


Choosing #3 = 14 keeps the numbers simple, and a corresponding 
eigenvector isv =[—5 4 14]?. 


(b) The characteristic equation is 


-A 2 0 
—2 -x O | =0. 
0 0 1-A 


To simplify the evaluation of the determinant, we interchange the first 
and third rows (remember that this just changes the sign of the 
determinant). This gives the characteristic equation as 

(1 — A)(A? + 4) = 0, so the eigenvalues are 1, 2i and —2i. 


The eigenvector equations for v = [x1 x2 2x3]7 are 


—Ax1 + 2x2 =, 
—274 = Ax2 = 0, 
( = A)x3 =); 
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e For \ = 1, the eigenvector equations become 
—%, +2%2 =0, —-2%,-x2=0, O0=0. 
These give 71 = 22 = 0. So choosing x3 = 1, a corresponding 
eigenvector is [0 0 1)". 
e = 6For A = 22, the eigenvector equations become 
—2i2, + 2x = 0, —22%) = 2x2 = 0, (1 — 21) x3 = 0, 
which reduce to rg = 1%, and 73 = 0. So choosing 7; = l, a 
corresponding eigenvector is [1 i OJ”. 


e Similarly, an eigenvector corresponding to \ = —2i is [1 —i OJ”. 


Solution to Exercise 35 


The eigenvector equations for v= [r, x2 x3]! are 
(4 = A)@1 + 7x + 6z3 = 0, 
6x4 oF (5 = A) x2 + 6x3 = 0, 
—821 = 102x2 + (—10 = A) x3 =(), 
e For \ = —2, the augmented matrix is 


6 7 610 
6 7 610 
—8 —10 —-8 | 0 


In this case it is helpful to interchange rows before doing anything 
else, so the starting arrangement will be 


—8 —-10 -8 | 0 Ry 
6 7 6140 Ro . 
6 7 640 Rs 


Reducing the elements below the leading diagonal in column 1 to zero 
(and shortcutting the usual procedure by subtracting the two identical 
rows at this early stage): 
—8 -10 -8 | 0 Ri 

R. + $Ri 0 -$ O|0] Ra. 

Rs; — R2 0 0 0 | 0 R3a 
The final row of zeros allows us to assign x3 the arbitrary non-zero 
value k. The second row tells us that #2 = 0, and back substituting 
gives x; = —k. Thus the general form of the eigenvector is 
v=k[{l 0 —1]", where k is an arbitrary non-zero value. 


e For \=-—1, the augmented matrix is 


5 7 610 Ry 
6 6 6 | 0 Re . 
—8 -10 -9 | 0 Rs 
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Reducing the elements below the leading diagonal in column 1 to zero: 


5 7 610 Ry 
5Rz — 6Ri 0 -—12 —-6 | 0 Raa . 
5R3 + 8Ry 0 6 3 | 0 Rza 


Reducing the element below the leading diagonal in column 2 to zero: 


5 7 610 
0 —12 —6 | 0 
2R3q + Rea 0 0 0; 0 


The final row of zeros allows us to assign x3 the arbitrary non-zero 


value k. Back substitution then tells us that x2 = —tk, and back 
substituting again gives 7; = —3k. Thus the general form of the 
eigenvector is v =k[-$ —3 1)", where k is an arbitrary non-zero 
value. Choosing k = 2 gives v =[-1 —1 2]. 


For \ = 2, the augmented matrix is 


2 7 6 | 0 Ry 
6 3 6 | 0 Ro . 
—8 -—10 —-12 | 0 Rz 


Reducing the elements below the leading diagonal in column 1 to zero: 


2 7 6 | 0 Ri 
R2-—3R, | 0 -18 -12 | 0 Raa . 
R3+4R, | 0 18 12/10 Rsa 
Reducing the element below the leading diagonal in column 2 to zero: 


2 7 6 | 0 
0 -—18 -12 |] 0 
R3a + Roa 0 0 0 | 0 


The final row of zeros allows us to assign x73 the arbitrary non-zero 


value k. Back substitution then tells us that x2 = —3k, and back 
substituting again gives 7, = —3k. Thus the general form of the 
eigenvector is v = k{—2 —2 1)", where k is an arbitrary non-zero 


value. Choosing k = 3 gives v =[-2 —2 3]?. 


Solution to Exercise 36 


si'sy = (2)(3) + (1)(—6) = 0, so the column vectors are orthogonal. 


(b) t7't. = (2)(—2) + (2)(—2) + (1)(0) = —8, so the column vectors are 


not orthogonal. 


Solutions to exercises 
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Solution to Exercise 37 


The characteristic equation is 


5—A 2 

2 2-2 
Expanding the determinant gives (5 — A)(2 — A) — 4 = 0, hence the 
characteristic equation may be written as A? — 7\ + 6 = 0, the roots of 


which are the eigenvalues A; = 6 and Az = 1. Clearly both eigenvalues are 
real, as they should be for a real symmetric matrix. 


=o. 


The eigenvector equation is 


Po” 22) I= bl 
2 2—Al ly 0} ° 
e A; = 6 gives the pair of equations 
—“x+2y=0 and 2¢-4y=0, 
which are equivalent to x = 2y. Choosing y = 1, we get v; = [2 1)”. 
e 2 =1 gives the pair of equations 
4x+2y=0 and 2%+y=0, 


which are equivalent to y = —2x. Choosing x = 1, we get 
vo=(1 —2]7. 


The inner product is v7 v2 = 2(1) + (—2)1 = 0, so the eigenvectors are 
orthogonal. 
Solution to Exercise 38 


Since v?v; =4+1=5 and va v2 =1+4=5, we have 


ze. tt. 12 Zo. lk 1 

ae ie 1}? es 4) * 
Solution to Exercise 39 
From equation (24), v = (¥/'v)¥1 + (V3 -v)¥2. Calculating the coefficients, 
we have 


os = 2 y H “ a dal 40 ee = 1 3] H = <= 
Hence 
30 Od 
v= / = Je 


We can easily check this by substituting the values for v1 and vo: 


=f] [4-2 6)-[1. 
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Introduction 


In Unit 2 you saw that the solution of the differential equation 


dx 
= A il 
Fr x (1) 


(where A is a constant) is 
a(t) = ape“. (2) 


Here xo is an arbitrary constant. This unit generalises this differential 
equation and solution to the case of a system of several differential 
equations with more than one dependent variable. An example is the pair 
of differential equations 


- = az + by, (3) 
“ = on dy, (4) 


where a, b, c and d are real constants. Here x(t) and y(t) are the dependent 
variables for which we want to find solutions. Such systems of differential 
equations arise frequently across all the mathematical sciences — in fact, 
whenever a system has constituent parts that interact with each other. 


Because both of the unknown functions, z(t) and y(t), occur in both of the 
differential equations, they must be solved ‘simultaneously’. At first sight 
this appears to be a difficult task. However, by using matrices it turns out 
that equations (3) and (4) can be cast in a form that is very similar to 


equation (1): Don’t worry if you can’t follow 
. all the details here. This 
MEY |G b| |x introduction is meant to be a 
y c d ; sketch of what is to come. We 


cover the same material at a 
slower pace and in more depth 


in Section 1. 
x= Ax, wherex= "| and A = i | : (5) Bese 
y ce ad 


which can then be written as 


This looks just like equation (1), and it would be satisfying to find a 
solution inspired by equation (2). An obvious extension to try is a solution 
of the form 


x(t) = ve, (6) 
where A is some constant scalar and v is a constant vector. Substituting 


this into the left-hand side of equation (5), we get 


x= — (ve) =v ae = \ve, 


dt 


Substituting from equation (6) into the right-hand side of equation (5), 
x = Ax, gives 


Ax = Ave, 
Equating both sides of equation (5), we get 
Av = Iv. 
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But this is the eigenvalue equation that we studied in Unit 5. So we see 
that equation (6) is a solution of equation (5) when A and v are an 
eigenvalue and an eigenvector of the matrix A. In fact, this observation is 
the central message of this unit, and we highlight it because of its 
importance. 


Putting systems of differential equations into matrix form and finding 
the eigenvalues and eigenvectors of the matrix of coefficients, is the 
key to solving systems of any number of linear differential equations 
of any order. 


Before developing the mathematics further, let us first look at a example of 
how equations like (5) arise in a simple model. We will not give full details 
of the modelling process, since the aim is to provide motivation and to give 
a fairly rapid impression of where and how systems of differential equations 
might occur in practice. You should not spend too much time dwelling on 

the details, as we will never ask you to derive the differential equations for 
a system in any assessment. 


Fluid in tanks and pipes 


Many engineering processes involve fluids being transferred in pipes from 
one tank to another. An example is shown in Figure 1. This could describe 
an industrial process where a liquid is pumped into a tank A, where it is 
treated in some way, before being transferred through a pipe to tank B, 
where it is stored before being supplied to run a machine. 


Raft) 


Tank A Tank B i 


Figure 1 The depths of liquid in two tanks are x(t) and y(t) 


To describe the behaviour of this system we need two variables, namely the 
depth of the liquid in each tank. The depths of liquid at time ¢ in tank A 
and tank B are x(t) and y(t), respectively (measured in metres). We 
assume that liquid is poured into tank A at a rate of Rin(t) (measured in 
cubic metres per second). There is a flow of liquid from tank A to tank B 
at arate Rap(t), and fluid is pumped out of tank B at a rate Rou(t). 


The rate of change of the depths is proportional to the rate at which fluid 
is added: 


& = ka [Rin(t) — Rap(®)], (7) 

y = kp [Rap (t) — Rout ()], (8) 
where ka and kp are two constants. We assume that the rate at which 
liquid flows through the pipe connecting tank A and tank B is proportional 


to the difference in height of the fluid in the two tanks, i.e. there is no 
pumping of fluid from tank A to tank B. 


Thus we write 
Rap = Kap(x—y), (9) 
where Kap is a constant. 
Substituting equation (9) into equations (7) and (8) gives the equations of 
motion for the heights of the fluid: 
&=—kaKayput+kaKapy + ka Rin(t), 
y = kpKape— kpKapy — kp Rout(t). 
These equations have the form 
z=ax+ by+ f(t), (10) 
y= cr + dy + g(t). (11) 


Here a, b, c and d are constants, and f(t) and g(t) are functions of time 
describing the rate at which fluid is poured into the first tank and pumped 
out of the second tank, respectively: f(t) = ka Rin(t), g(t) = —kp Rout(t). 


If we set f(t) = 0 and g(t) =0, so that no fluid is entering the first tank or 
being pumped from the second tank, then equations (10) and (11) are 
exactly the same as equations (3) and (4), with solution given by 
equation (6) in terms of the eigenvalues and eigenvectors of the matrix of 
coefficients A = < e 
c d 
eigenvalues and eigenvectors can still be found, as we will see in the next 
section. In fact, the main job of Section 1 will be to find the general 
solution of systems of differential equations that have the same form as 
equations (10) and (11). 


. For f £0 and g £0, a solution in terms of 


This application to the flow of fluid in tanks may seem somewhat 
specialised. However, all fluid flows obey the same physical laws, and 
systems of coupled linear differential equations find applications in many 
contexts involving fluid flow, such as understanding the drainage of 
rainwater by river systems. 


The mathematical equations that describe the flow of fluids also apply to 
other physical phenomena such as the flow of current in electrical circuits, 
and non-physical phenomena such as models of the flow of money in the 
economy. The box below tells the story of an ingenious economist who 
exploited this duality to build an analogue computer to model the British 
economy. 


Introduction 


The values of the constants are 


a= —kaKag, b= —a, 


C= kp Kap, d=-—c. 
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MONIAC: macroeconomics for plumbers! 


The MONIAC (Monetary National Income Analogue Computer) was 
created in 1949 by electrical engineer turned economist Bill Phillips to 
model the economy of the United Kingdom (UK) — see Figure 2. 
Phillips was still a student at the London School of Economics when 
he created his first MONIAC in his landlady’s garage in Croydon at a 
cost of £400. 


The MONIAC was an analogue computer that used the flow of water 
to model the workings of an economy. It consisted of a series of 
transparent tanks and pipes. Each tank represented some aspect of 
the UK national economy, and the flow of money around the economy 
was illustrated by coloured water. At the top was a large tank called 
the treasury. Water (representing money) flowed from the treasury to 
other tanks representing the various ways in which the country could 
spend its money — for example, there were tanks for health and 
education. To increase spending on health, a tap could be opened to 
drain water from the treasury to the tank that represented health 
spending. Water then ran further down the model to other tanks, 
representing other interactions in the economy. Water could be 
pumped back to the treasury from some of the tanks to represent 
taxation. Changes in tax rates were modelled by increasing or 
decreasing pumping speeds. Import and export were represented by 
water draining from the model and additional water being poured 
into the model. 


his MONIAC 


Phillips had realised that the set of differential equations that 
described the flow of money around the economy was the same as the 
set that described the flow of fluids between tanks. So by building the 
MONIAC, he was building a model of the economy. 


The MONIAC had primarily been designed as a teaching aid, but was 
soon discovered also to be an effective economic simulator (accurate 
to +2%), at a time when electronic digital computers that could run 
complex economic simulations were unavailable. A number of 
MONIAC machines were eventually built, ending up in companies, 
banks and universities around the world. One of the few remaining 
working machines is now on permanent display at the British Science 
Museum. 


Systems of coupled linear equations in a broader 
context 


Equations like (3) and (4) are described as linear because the right-hand 
side involves only linear functions of 2 and y. However, most dynamical 
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systems are described by non-linear equations of the form 


z= f(x,y), 
y = 9(2,y), 


where f(x,y) and g(x,y) are any functions of x and y. Although in 
general it is not possible to solve such equations analytically, it is possible 
to get a lot of information about how the solutions behave using analytical 
methods. In particular, the behaviour of the solution near equilibrium 
points, where + = y = 0, is studied using the techniques of this section. We 
will return to the discussion of non-linear dynamics in Unit 13. 


There is another reason for studying systems of coupled linear equations. 
Quantum mechanics, which is our most fundamental theory of physical 
processes, can be expressed in terms of an equation of motion, similar to 
equation (5), for vectors in an abstract space of physical states. 


From coupled differential equations to images of brains 


Particles like electrons and protons have a property called spin, which 
allows them to behave like tiny magnets when placed in a strong 
magnetic field. Hospitals use an invaluable imaging technique called 
magnetic resonance imaging (MRI) that is based on this idea. 


Your body contains many hydrogen atoms distributed unevenly across 
your various tissues. At the heart of each hydrogen atom is a spinning 
proton that can be thought of as a tiny magnet. According to the 
general principles of quantum physics, the spin of such a proton is 
represented by a two-component column vector v, called the spin 
state vector. The components of this vector are complex numbers, 
and they tell us about the orientation of the proton’s spin at any 
given instant. The spin state vector may vary with time, satisfying an 
equation known as the Schrodinger equation, which takes the form 


dv il 

oe AY: (12) 
Here, H is a 2 x 2 matrix called the Hamiltonian, v is a 2 x 1 column 
vector (the spin state vector) and fi is a physical constant that 
appears throughout quantum physics. The only unusual feature of 
equation (12) is that it involves complex numbers: in general, H and 
v both have complex elements. Otherwise, equation (12) is of exactly 
the form that you have met before. 


A concerted oscillation, in the directions of proton spins, is set off by 
the MRI scanner somewhere inside your body. This produces a tiny 
electromagnetic signal that can be detected outside. So 
understanding how to solve equation (12), and hence the motion of ; 
proton spins, provides the key to creating MRI images such as that Figure 3. MRI image of the 
shown in Figure 3. brain 
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Note that sometimes we group a 
system of equations using a 
brace ({), as here, but often we 
do not. 


166 


Study guide 


In this unit we show you how to find the solution of certain systems of 
differential equations. We begin in the next section with systems where the 
derivatives are of first order, like those in equations (3) and (4), and 
equations (10) and (11). In Section 2 we consider the most commonly 
occurring systems where the derivatives are of second order. In the 
physical or engineering sciences, these systems describe the motion of 
objects that are coupled together in such a way that they have a vibrating 
or oscillating motion. In Section 3 we consider a special type of vibration, 
called ‘normal modes’, where the components of the system all vibrate 
with the same frequency. 


This unit assumes a knowledge of first- and second-order differential 
equations, as covered in Units 2 and 3, and of eigenvalues and eigenvectors, 
as covered in Unit 5. 


1 First-order systems 


In this section we describe the techniques that you need in order to solve 
systems of first-order differential equations. Most of the examples and 
exercises concentrate on systems with only two (occasionally three) 
dependent variables, because the algebra is easier to understand. Once you 
have learned how to solve these, the extension of the technique to higher 
numbers of dependent variables is straightforward. 


Subsection 1.1 describes how to write systems of differential equations in 
matrix form, and defines two types: homogeneous and inhomogeneous. 
Subsections 1.2 and 1.3 show you how to solve the homogeneous type. 
Subsections 1.4 and 1.5 show you how to solve the inhomogeneous type. 
The method parallels the methods that you saw in Unit 3 for solving 
second-order homogeneous and inhomogeneous differential equations of a 
single dependant variable. 


1.1 Matrix notation 


In Unit 5 you saw that any system of linear equations can be written in 
matrix form. For example, the equations 


3% + 2y =5, 
e+4y=5, 


can be written in matrix form as 


3 2) |x| _ |5 
1 Ally!) [5]? 
that is, as 


3.2 x 5 
Ax =b, where A= |} 1 and b= [2]. 


1 First-order systems 


In a similar way, we can write systems of linear differential equations in 
matrix form. To see what is involved, consider the system 


& = 3x + 2y +5t, 
y= x+4y +5, 


(13) 


where « and y are functions of t. This has the same form as equations (10) 
and (11) derived in the Introduction. It can be written in matrix form as 


EE Ell 


x=Ax+h, where A= as x= “! and h= ay . 
14 Yy 9) 


So we have converted a system of differential equations into a single matrix 
differential equation. Note that the components of x are functions of t, and 
so generally are the components of h. The matrix A is independent of t 
and is called the matrix of coefficients. 


We can similarly represent systems of three, or more, linear differential 


equations in matrix form. For example, the system Notice that we write the 
; ; derivatives on the left-hand side. 
t= 3x + 2y+2z+ e, On the right-hand side we 
y = In + Qy De", vertically align all the terms in 


x, y and z separately, leaving a 
space where a term is zero. 


2=2¢ + 4z, 


can be written in matrix form as x = Ax + h, where 


a ae x ef 1 
A= |2 2 01], x=/y| and h= |[2e] = |2| e’. (14) 
20 4 z 0 0 


There are two types of matrix differential equation that you must be able 
to recognise. These are defined as follows. 


Definition 
A matrix differential equation of the form x = Ax +h is said to be Note that in an inhomogeneous 
homogeneous if h = 0, and inhomogeneous otherwise. system, some, but not all, of the 


components of h may be 0. 


For example, the system 
{ = 2x +4 3y, 


(15) Here x and y are functions of t. 
y=2e+ Y, 


has matrix form 


Al-2 JG as) 


and so is homogeneous, whereas systems (13) and (14) are inhomogeneous. 
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In guessing this form for the 
solution, we are assuming that 
all components have the same 
exponential dependence on time. 
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Exercise 1 


Write each of the following systems in matrix form, and classify it as 
homogeneous or inhomogeneous. 


(a) asia le 


y= 2x —2 
r= 
0) {= 
y= 
T= o£ 


1.2 Homogeneous systems: the eigenvalue method 


General solution 


We now show how to solve the first-order homogeneous case, x = Ax. This 
was partially discussed in the Introduction, where we discovered that the 
method involves calculating eigenvalues and eigenvectors of matrices, 
which was covered in the previous unit. 


Suppose that we are given a set of coupled, first-order, homogeneous, 
linear differential equations, like those in system (15). Then we put them 
in matrix form 


x= Ax, (17) 


as in equation (16). We assume that the matrix A has coefficients that are 
constants, i.e. do not depend on the independent variable t. Then A has 
eigenvectors v and eigenvalues \ that are also independent of t and satisfy 
the eigenvalue equation 


Av = Dv. (18) 
Then it is easy to show that x(t) = ve is a solution of the differential 
equation. Substituting x(t) = ve into the left-hand side of equation (17), 


we get 
x=— (ve) =v a OM Save. 
dt 
Substituting into the right-hand side of equation (17) and using 
equation (18), we get 
Ax = Ave = dve. 


So the left- and right-hand sides are equal, and we have the following 
result. 


1 First-order systems 


A system of differential equations 


Se = ANS 
has a solution given by 
<= ver, 


where X is an eigenvalue of the matrix A corresponding to an 
eigenvector v. 


The question now arises as to how we find the general solution. The 
following example illustrates the idea for a pair of simultaneous differential 
equations. 


Example 1 


(a) Find two independent solutions of the simultaneous differential 
equations 


t=x+A4y, 
y= u—2y. 


(Hint: The matrix i | has eigenvectors H and |_| with 
corresponding eigenvalues 2 and —3.) 
(b) Find the general solution. 
Solution 
(a) The differential equations can be written in the matrix form 
il-b all 
y 1 —2) [y|- 
So the matrix of coefficients is 
nele 4. 


Therefore using values for the eigenvalues and eigenvectors given in 


the hint, we can construct two independent solutions ve: 


xX, = Hl e and xX) = 3] er 


(b) So there are two independent solutions, x;(t) and x2(t). It turns out, 
for reasons discussed below, that the general solution is a general linear 
combination of these, i.e. ax; + 6x2. Hence the general solution is 


el=ofe solo 


where a and £ are arbitrary constants. So the general solution in 
component form is 


x = 4ae* + Be! and y =ae* — Be~*. 
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This is a simple extension of the 
principle of superposition 
discussed in Unit 3. 
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Important note Note that we cannot change the equation for x to 
x = ae” + Be~* (i.e. absorb the constant 4 into a) and leave the 
equation for y unchanged, since a occurs in both the equation for x 
and the equation for y. This is an important difference between 
differential equations of a single variable and coupled differential 
equations. 


Exercise 2 
(a) Find two independent solutions of 
x= 3x4 2y, 
{ y= w2+Ay. 


14 1 
corresponding eigenvalues 5 and 2.) 


(Hint: The eigenvectors of fi A are H and Ei with 


(b) Find the general solution. 


The reason why the linear combination of solutions is the general solution 
can be seen as follows. First, since x; = Ax, and xg = Axa, the linear 
combination x = ax; + (x2 satisfies 


* = ax, + Pko = aAx, + BAx2 = A(ax, + Bx2) = Ax. 


So x = ax, + (x2 is a solution of the system of differential equations if x; 
and x2 are. The fact that it is the general solution follows because x 
contains two arbitrary constants (a and #3), and this is sufficient for a 
system of two first-order differential equations. 


In Example 1 we saw that a system of two differential equations has a 

2 x 2 matrix of coefficients, which gives rise to two independent solutions. 
The general solution is a general linear combination of these and so has 
two arbitrary constants (a and (). In the general case, a system of 

n differential equations has an n x n matrix of coefficients, which gives rise 
to n independent solutions of the form ve*’. The general solution is a 
general linear combination of these and contains n arbitrary constants. 


You should note that the method outlined above works only when the 

n X n matrix of coefficients has n (linearly independent) eigenvectors. The 
method fails if it has fewer. However, we do not consider such anomalous 
cases in this module. 


Although this method works for both real and complex eigenvalues, in the 
next subsection we will find that it is convenient to modify it slightly in 
the complex case. So we use it in this form only for the real case. We 
summarise our method for real eigenvalues as follows. 


1 First-order systems 


Procedure 1 Solution of a homogeneous system with real 


eigenvalues The case of complex eigenvalues 
; F : , will be covered in Subsection 1.3. 
To solve a system of linear constant-coefficient first-order differential The ease shers & does-et have 
equations x = Ax, where A is an n x n matrix, do the following. n eigenvectors is beyond the 
: ; , i dule. 
1. Find the eigenvalues A1,A9,...,An and a corresponding set of scope er ea module 
eigenvectors V1, V2,...,Vn- 


2. Write down the general solution in the form 
x= Civie ' -Ojvoe? +. 4 Ove’, (19) 


where C1, C2,...,C, are arbitrary constants. 


The next example applies this procedure to find the general solution of a 
system of three differential equations. 


Example 2 

Find the general solution of the system of differential equations 
&= 3x + 2y + 2z, 
y = 2x + 2y, 
oS 2g + 4z. 


3.2 2 2 1 —2 
(Hint: The matrix |2 2 0] has eigenvectors | 1], 2} and 2/, 
20 4 2 —2 1 


with corresponding eigenvalues \; = 6, Ag = 3 and A3 = 0.) 
Solution 


The matrix of coefficients is 


3.2 2 
A=]2 2 0 
2 0 4 


Using the values for the eigenvalues and eigenvectors in the hint, the 
general solution is therefore 


x 2 1 —2 
y}J =a]l ages By] 2 ee vy} 2], Note that the last term on the 
z ) a) 1 right-hand side corresponds to 


the term in e*3¢ = e® = 1. 
where a, 8 and y are arbitrary constants. So in component form this 
becomes 
x = 2ae* + Be* — 24, 
y = ae®™ + 26e* + 2y, 
z = 2ae™ — 26e% + +. 
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0 5 10 15 20 25 
Figure 4 Behaviour of the 
solution to Example 3. For 

t = 0 the solution (orange 
line) starts at the point (2,3), 
but as ¢ increases it 
approaches y = 42 (green 
dashed line). 
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Exercise 3 
Find the general solution of 


v= 5x + 2y, 
y = 2a + Oy. 


(Hint: The eigenvectors of 


an are : and ! , with corresponding 
: 2 5 1 —1 
eigenvalues 7 and 3.) 


Initial conditions 


Now that we know how to find the general solution of a system of 
differential equations, let us use this to find solutions that satisfy specific 
initial conditions. The basic procedure is the same as used in Units 2 
and 3, i.e. use the initial conditions to find particular values of the 
arbitrary constants. The following example illustrates the idea. 


Example 3 


A particle moves in the xy-plane in such a way that its position (z, y) at 
any time ¢ satisfies the simultaneous differential equations 


t=ax-+4y, 
{ y=au—2y. 
Find the position (x,y) at time t if x(0) = 2 and y(0) =3. 
Solution 


This system of differential equations was solved in Example 1. The general 
solution was found to be 


claofle aL den 


where a and £ are arbitrary constants. 


Since x(0) = 2 and y(0) = 3, we have, on putting t = 0, 


2=4a+ £, 
3= a-— QB. 
Solving these equations gives a = 1, 6 = —2, so the required particular 


solution is 


In the above example the particle starts at the point (2,3) when t = 0, and 
follows a certain path as t increases. The ultimate direction of this path is 
easy to determine, because e~*! is much smaller than e?! when t is large, so 
we have [z yj’ ~ [4 1] e”, that is, x ~ 4e and y ~ e”, soy ~ 4a. 
Thus the solution approaches the line y = <u as t increases. This 
behaviour is illustrated in Figure 4. 


Exercise 4 
(a) Use the above method to solve the system of differential equations 
& = 5x + 2y, 
{ y = 2x + dy, 
given that « = 4 and y = 0 when t = 0. 
(Hint: The result of Exercise 3 will be useful.) 


(b) How does the solution behave for large t? 


Exercise 5 


A particle moves in three-dimensional space in such a way that its position 
(x,y,z) at any time ¢ satisfies the simultaneous differential equations 


“&= 52, 
Y= 2+2y+ 2, 
Z2= e+ yt2z. 
Find the position (x,y,z) at time t if x(0) = 4, y(0) = 6 and z(0) = 0. 


5 0 0 2 0) 0 
(Hint: The eigenvectors of |1 2 1] are }1], |1] and | 1], 
1 1 2 1 1 —1 


corresponding to eigenvalues 5, 3 and 1.) 


In the next subsection we investigate what happens when the eigenvalues 
of the matrix A are complex numbers. 


1 First-order systems 
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1.3. Complex eigenvalues 


This subsection deals with complex quantities. Before you begin, try the 
following warm-up exercise. 


Exercise 6 
If z =4+ 32, show the following. 
(a) Z=z (b) Rez=3(z+72) (c) Imz = $(z-2) 


So far, all our examples and exercises have involved only real eigenvalues. 
We now investigate what happens when we have complex eigenvalues. In 
fact, because the arguments leading to Procedure 1 do not rely on the 
eigenvalues being real, they can also be used for the complex case. 
However, using equation (19) with complex eigenvalues \; means that the 
arbitrary constants C1, C2,... must also be complex for the solution x to 
be real. It would be much more convenient if a real solution x were 
expressed in terms of real quantities only. In this subsection we see how to 
modify equation (19) to a more useful form. 


We begin with a simple example that leads to complex eigenvalues and 
illustrates the problem. Suppose that we want to solve the system of 
differential equations 


{ pad (20) 


y= —o. 


1 First-order systems 


We find the solution using Procedure 1. The matrix of coefficients is 


Q 1 
a-Li dl 
and the eigenvalues of A are easily shown to be 


A =i with eigenvector [1 4°, Notice that one eigenvalue is the 


ee : aT complex conjugate of the other, 
\ = —1 with eigenvector [1 —1] aad see 4 a the 


Now since Procedure 1 works for complex as well as real eigenvalues, the eigenvectors. 
general solution of the given system of differential equations can be written 


| =C | aD i] a, (21) 


where C' and D are arbitrary complex constants. 


Since x and y are real, we would very much like the right-hand side of The problem is analogous to the 
equation (21) to be written in terms of real quantities. In order to see how one in Unit 3 when we had 
to do this, notice that the eigenvalues and eigenvectors \ and v occur in complex roots of the auxiliary 
? ' : . equation. 
complex conjugate pairs. (The complex conjugate of a vector v is the 
vector V whose elements are the complex conjugates of the respective 
elements of v. For example, if v = [1+2i —3¢]?, then ¥V=[1— 2% 3¢]”.) 
This is always true for a matrix with real elements. So the solution will 
always be of the form 


x = Cve™ + Dve™, (22) 


where C and D are arbitrary complex constants. This is the form of 
equation (21). Now, since x is real, when we take the complex conjugate of 
both sides, we get 


x =X = Cve* + Dve™. (23) 


Here we have used the fact that x and ¢ are real, and z = Z for any z. 
Comparing equations (22) and (23), we see that D = C and hence 


x = Cve™ + Ove™, (24) 


This equation for x is now manifestly real. To see this, simply take the 
complex conjugate of both sides and check that x = x. Now let us set 
C=a+i, where a and £ are real. Then 


x = (a + i8)ve* + (a— iB)ve™ 
= a(ve + ve) + iB(ve% — ve), 
But for any complex number z, Rez = $(z +2) and Imz = #(z-2), 
hence 
x = 2a Re(ve*) — 28 Im(ve™). 


Since C' was an arbitrary complex constant, a and ( are arbitrary real 
constants, so we can absorb the factors of 2 and —2 into them, giving 


x = aRe(ve™) + BIm(ve”). (25) 
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Euler’s formula was used in a 
similar way in Unit 3, where it 
gave trigonometric solutions of 
second-order differential 
equations. 
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This is a simple form for x that is obviously real and as such is sometimes 
called a real-valued solution. Let us apply it to the preceding example. 


Example 4 


For the system in equation (20), find a form of the solution x that is 
clearly real. 
Solution 
The eigenvalues of the matrix of coefficients are 

\ =i with eigenvector v = [1 i]? 
and their complex conjugates 

\ = —i with eigenvector ¥= [1 —i]’. 


We are aiming to use equation (25), so let us choose 


We use Euler’s formula, e*’ = cost + isint, to find the real and imaginary 
parts of ve: 


i 
veo" = H (cos t + isin t) 


cos t ; sint 
= : + 4 
—sint cost 
— 


real part imaginary part 


We can now apply equation (25), to obtain 


cost sint 
re fee +8 ba : 


where a and £ are arbitrary real constants. 


For the case of a system of more than two differential equations, complex 
eigenvalues and eigenvectors also come in complex conjugate pairs — i.e. if 
A, v are eigenvalue and eigenvector, then so are \,V. So we generalise 
Procedure 1, incorporating equation (25), in the following way. 


Procedure 2 Solution of a homogeneous system with 
complex eigenvalues 


To obtain a real-valued solution of a system of linear 
constant-coefficient first-order differential equations x = Ax, where A 
is an n X n matrix with distinct eigenvalues, some of which are 
complex (occurring in complex conjugate pairs \ and \, with 
corresponding complex conjugate eigenvectors v and Vv), do the 
following. 


1. Find the eigenvalues 1, A2,...,An and a corresponding set of 
eigenvectors V1, V2,..-,Vn- 


2. Write down the general solution in the form 
x= Cie 2 Oayse 7 Oye 


3. Replace the complex terms ve*’ and ver! appearing in the 
general solution with Re(ve*’) and Im(ve’). 


The general solution will then be real-valued for real C1, C2,...,Chn. 


Example 5 


(a) Find the general solution of the system of differential equations 
{ z= 3x2 -y, 
y=2r+y, 
(Hint: The eigenvectors of a, are V = ; and V = : 
; 2 1 1-14 1+2]’ 
with corresponding eigenvalues \ = 2+ 7% and \ = 2 —i.) 
(b) Find the particular solution satisfying z = 3 and y = 1 when t = 0. 
Solution 


(a) The matrix of coefficients is 


a= fap 
so using the hint, the general solution can be written as 
x = Cve™ + Dve* =C | : |e (are ae | : | eo 
1-12 1+. : 
where C and D are arbitrary complex constants. 


To obtain a real-valued solution, we follow Procedure 2 and write 


a | 1 | e(2tit 


1_ (cost + isin t) 


cee 
1 —12)(cost + isin t) 


a 
Soden 

al cost + isint | 
2 lend 


cost + sint) + i(sint — cost) 


cost 44 et sint 
te : ‘ 
cost + sint sint — cost 
—— ee” 
~~ part imaginary part 


1 First-order systems 
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The real-valued general solution of the given system of equations is 
therefore 


vr) at cost ot sint 
"| ee = + ca ee lea - | , (26) 


where a and £ are arbitrary real constants. 


(b) In order to find the required particular solution, we substitute 2 = 3, 
y = 1 and t = 0 into equation (26), to obtain 


hl=eli] +e[-3} 


so3=aandl=a-— 8, giving 6 = 2, and the solution is therefore 


"| _ einen ot 


y (cost + 5 sin t) 


Exercise 7 
(a) Find the general solution of the system of differential equations 
z= —3az — 2y, 
{ y= Art y. 


: : —3 —2 1 1 : 
(Hint: The eigenvectors of 4 | are z _ | and & mal with 


corresponding eigenvalues —1 + 2i and —1 — 2i.) 


(b) Find the particular solution satisfying « = y = 1 when t = 0. 


Exercise 8 


(a) Find the general real-valued solution of the system of equations 


t= =F +2, 
y= @+ tz, 
Z=-2 +2 

1 0 1 0 1 1 

(Hint: The eigenvectors of | 1 2 1] are {1}, |—7] and] 7], 

-1 0 1 0 z —1 


with corresponding eigenvalues 2, \ = 1+i and \ = 1~—i.) 


(b) Find the solution for which « = y = 1 and z = 2 when t = 0. 


Anomalous cases 


The method outlined in the last two subsections works only when the 
n X n matrix of coefficients has n (linearly independent) eigenvectors. 
As was mentioned in Unit 5, it can happen that an n x n matrix has 
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1.4 First-order inhomogeneous systems 


In the previous subsections you saw how to solve a system of differential 
equations of the form x = Ax, where A is a given constant-coefficient 
matrix. We now extend our discussion to systems of the form 


where h(t) is a given function of ¢t. Our method involves finding a 


x= Ax+h(t), Here we write h(t) to emphasise 
that h is a function of t. 
Henceforth we will abbreviate 


‘particular integral’ for the system, and mirrors the approach that we took _ this to h. 
for inhomogeneous second-order differential equations in Unit 3. 


In Unit 3 we discussed inhomogeneous differential equations such as 


d2 
a + 9y = 2e?”. (27) See Unit 3, Example 9. 


To solve such an equation, we proceed as follows. 


1. 


We first find the complementary function of the corresponding 
homogeneous equation 


ay 


dx? 


which is, in this case, 


+ 9y = 0, 


Ye = C1 cos 3a + Cosin3z, 
where C; and C2 are arbitrary constants. 


We then find a particular integral of the inhomogeneous equation (27). 
It is easy to check that 


— Ibe 
Yp = 9€ 


is such a particular integral. 
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This is x = Ax +h, where 
h= [4e%# —e%]?, 


We use the term particular 
integral rather than particular 
solution. The latter is more 
appropriately used for the 
solution of system (28) that 
satisfies given initial or 
boundary conditions. 
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The general solution y of the original equation is then obtained by adding 
these two functions to give 


Y =Yc+ Yp = Cicos3x + Cosin 3x + ger". 


A similar situation holds for systems of linear first-order differential 
equations. For example, in order to find the general solution of the 
inhomogeneous system 


& = 32 + 2y + 4e*, 
y asf + Ady = et ) 


which in matrix form becomes 


z 2 Sila Ae3t 
l= [t E+E, 
we first find the general solution of the corresponding homogeneous system 
v= 3x + 2y, 
y= xt Ay, 


which is the complementary function 


Xo = **| =a H et + B 3] eo (29) 


where a and # are arbitrary constants (see the solution to Exercise 2(b)). 


We next find a particular solution, or particular integral, of the original 
inhomogeneous system (28). In Subsection 1.5 we will show that 


Up 3] o3t 
 —— = er, 30 
v= [l= La 0 
is such a particular integral. The general solution of the original 
system (28) is then obtained by adding equations (29) and (30): 


A - E+E 
=a[!e soars [Je 


That this is the general solution can be seen as follows. Since x, is the 
general solution of the homogeneous equation x, = Ax,, and Xp is a 
particular integral of the inhomogeneous equation xp = Ax, + h, setting 
X = X_ + Xp gives 
xX =X,+xXp = Ax,+Ax,+h 

= A(x, +p) +h 

= Ax-+h. 
Therefore x is a solution of the inhomogeneous equation and contains two 
arbitrary constants (from x,). This is sufficient to guarantee that x must 
be the general solution of the inhomogeneous equation. This argument is 


of course true for a system of any number of differential equations, so we 
have the following result. 


General solution of an inhomogeneous system 


If x. is the complementary function of the homogeneous system 
x = Ax, and xz, is a particular integral of the system x = Ax + h, 
then x, + Xp is the general solution of the system x = Ax +h. 


Exercise 9 
Write down the general solution of the system 
c= 3x+2y+t, 
‘i _ x+ 4y + Tt, 


given that a particular integral is 


= 4 = 7 
Lp =tt+s, Yp = —2t— 7G. 


(Hint: The eigenvectors of i A are H and 4]: with corresponding 


eigenvalues 5 and 2.) 


Although it is easy to verify that equation (30) is a solution of the given 
system, by direct substitution, we now show you how to determine it. 


1.5 Finding particular integrals 


We now show you how to find a particular integral x, in some special 
cases. We consider the system x = Ax +h in the situations where h is a 
vector whose components are: 


e polynomial functions 
e exponential functions. 


Our treatment will be similar to that in Unit 3, where we found particular 
integrals for linear second-order differential equations using the method of 
undetermined coefficients. To illustrate the ideas involved, we consider the 
system 


: 2 

x=Ax+h, where A= ki i : 
The first stage in solving any inhomogeneous system is to find the 
complementary function, that is, the solution of the system x = Ax. The 
complementary function for this system was found in the solution to 
Exercise 2(b): 


i =a H ering 1] o (31) 


To this complementary function we add a particular integral that depends 
on the form of h. We now look at examples of the above two forms for h, 
and derive a particular integral in each case. 


1 First-order systems 
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Here h=[t 7t]", so h is linear 
in t. 


You may have been tempted to 
use a simpler trial solution, of 
the form 


x} fat 

y| jet} 
Unfortunately, this does not 
work — try it and see! You may 


recall something similar in 
Unit 3. 


These equations hold for all 
values of t, which means that 
each of the bracketed terms 
must be zero. 
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Example 6 
Find the general solution of the system 
L=3x+2y+t, 
{ y= xt+4y+ 7. 
Solution 
The complementary function is given in equation (31). 


We note that h consists entirely of linear functions, so it seems natural to 
seek a particular integral of the form 


x| _ jat+b 
y|  |ct+d]’ 
where a, b, c and d are constants that we need to determine. So x = at +b, 
y =ct+d, and differentiating these we get + = a, y = c. Substituting 
these values into the simultaneous equations gives 
a = 3(at + b) + 2(ct +d) +t, 
c= (at+b)+4(ct + d) + 7t. 
We now rearrange these equations, separating constant terms from the 
terms linear in ft: 
{ (3a + 2c + 1)é + (3b + 2d — a) = 0, 


(a+4c+7)t+(b+4d—-c) =0. (32) 


Equating the coefficients of t to zero in equations (32) gives 


3a+2c+1=0, 
{ a+4ce+7=0, 
which have the solution 
a=1, c=-2. 


Equating the constant terms to zero in equations (32), and putting a = 1, 
c = —2, gives the equations 


eee 


b+4d+2=0, 
which have the solution 
=A = 7 
b=3, d=-q.- 


Thus the required particular integral is 


eli 
Yp 


7 
= ae 
and the general solution is 


fe] =[2]+[F]-«[ Josep je- 


v] 


4 
—2t — 35 
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Exercise 10 
Find the general solution of the system 
t=x2+4y-—t4+2, 
{ y= xu —2Qyt+5t. 


(Hint: For the complementary function, see Example 1.) 


Example 7 
Find the general solution of the system Here h = [4e*# —e*]7, so h is 
& = 3a + 2y + 4e*, exponential. 
tie x + 4y — e**. 
Solution 


The complementary function is given in equation (31). We note that both 
components of h include the same exponential function e*”, so it seems 
natural to seek a particular integral of the form 


wi ae* QA) 3¢ 
y = bert = b Ee; 
3t 


where a and 6 are constants that we need to determine. So x = ae", 
y = be*", and differentiating these gives « = 3ae®’, y = 3be*". Substituting 
these values into the simultaneous equations gives 


3ae** = 3ae%* + 2be** + 4e%, 
3be** = ae** + Abe** — et, 
or, on dividing by e*, 


3a = 3a + 2b +4, 
3b= a+4b—-1. 


Rearranging these equations gives 


2b = —4, 
a+ b= 1, 
which have the solution 
a=3, b=-2. 


Thus the required particular integral is 


e5| | se] _ of a eal 
Yp|  |—2e24| — |-2 : 


and the general solution is 
les has 
uv] Yc Yp 


=a H et 4 B 3] et 4 | eG. 
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Exercise 11 
Find the general solution of the system 
g=0+4y+4e—, 
ee ee: 


(Hint: The complementary function is the same as that of Exercise 10.) 


We summarise our results in the following procedure. 


Procedure 3 Finding particular integrals 


To find a particular integral xp = [zp yp|’ for the system 
x = Ax +h, do the following. 


1. When the elements of h are polynomials of degree k or less, 
choose xp and y, to be polynomials of degree k. 


2. When the elements of h are multiples of the same exponential 
function, choose zp and yp to be multiples of this exponential 
function. 


To determine the coefficients in xp and yp, substitute into the system 
of differential equations and equate coefficients if necessary. 


Exercise 12 
Consider the system of differential equations 
z= 2c + By + e*, 
{i Qe+ y+4e*. 
(a) Evaluate the eigenvalues and eigenvectors of the matrix of coefficients. 


(b) Find the solution of this system subject to the initial conditions 
x(0) = %, y(0) = 3. 
(c) How does the solution found in part (b) behave for large t? 


Other cases 


This short subsection is included for completeness, but the material 


in it will not be assessed. 


Combinations of cases 


Procedure 3 allows you to determine the particular integral when the 
inhomogeneous term h has components that are simple functions like 
polynomials or exponentials. When h is a linear combination of these 
simple functions, for example 


h(t) = hi(t) + ho(t), where hy = |_|] e* and hy = ;| i, 


we find a particular integral for each of h, and hg separately, using the 
above method (see Examples 6 and 7), then add the two particular 
integrals together. We would use a similar trick if hy; and hg were two 
exponential terms with different exponents. 


Exceptional cases 


Occasionally Procedure 3 will fail to find a particular integral of an 
inhomogeneous system x = Ax +h. This usually occurs when the 
inhomogeneous part h is related to the complementary function x,. For 
example, 


& = 3x + 2y + 6e”, 
y= c+ 4y4 3e, 


has a complementary function given by equation (31): 


pl-eBesl de 


We see that the term e?! occurs in both the complementary function and 
the inhomogeneous part h. If we follow Procedure 3 and try to find a 
particular integral of the form x, = [a1 ag]? e**, then we will find that the 
system of equations for a; and ag has no solution and so the method fails. 


The resolution is identical to that discussed in Unit 3 when the method of 
undetermined coefficients failed for the same reason: we simply try a more 
general form for the particular integral. In this case we would try a 
solution of the form 
a i a | ort 
Pp ag + bot . 


This is analogous to what was done in Unit 3 for second-order differential 
equations. 


1 First-order systems 
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Figure 5 A compound 
harmonic oscillator; we 
analyse the vertical motion of 
these two masses, connected 
by springs 


Forces and accelerations are 
normally written as vectors. 
However, when we have motion 
in only one direction, we just 
consider the vertical component 
as we have done here. 


Don’t worry if you can’t follow 
this argument; it is not 
important for what follows. 
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2 Second-order systems 


In this section we show how the methods already introduced to solve 
systems of first-order differential equations can be adapted to systems of 
second-order differential equations. We restrict ourselves to considering 
homogeneous second-order systems, as the inhomogeneous case can be 
handled using the same technique as in the previous section. We begin with 
a short motivational section describing how the motion of bodies coupled 
by springs leads naturally to such equations. Don’t worry too much about 
the details, as you will not be assessed on deriving equations of motion. 


2.1 Mechanical oscillations and normal modes 


In Unit 3 we discussed the motion of a mass suspended on a spring. We 
showed that this system exhibits a sinusoidal motion called simple 
harmonic motion. Here we consider a generalisation of this mass and 
spring system, illustrated in Figure 5. The system consists of two particles, 
each of mass m, suspended by springs with spring constant k, with one 
mass suspended below the other. 


This system can rest in equilibrium with both masses stationary. In order 
to describe the motion of this system, we need to describe the 
displacement of the masses from this equilibrium position. Let the 
downward displacement of the upper mass from its equilibrium point 

be x1, and let the downward displacement of the lower mass from its 
equilibrium point be x2. We need to find equations of motion for x1(t) 
and £2(t). These are obtained from Newton’s second law: 


dx, 

errs = Fi, arr a = Fo, (33) 
where F} is the force on the upper mass due to its displacement from its 
equilibrium position, and Fy is the force on the lower mass. These forces 
are linear in 71 and x2, and can be shown to satisfy 

Fi = k(x = £1) = kay = k(x = 221), 

Fy» = Kel = x2). 
We do not expect you to be able to derive such equations. However, the 
following is an argument that makes them plausible. 


d2x9 


Look at Figure 5. Suppose that we move the upper mass down by a 
positive amount A and hold the lower mass still, so that 7, = A > 0 
and x2 = 0. Then we expect an upwards restoring force from both 
springs on the upper mass, and a downwards restoring force from just 
the lower spring on the lower mass. 


The equations for the forces give Fy = k(xo — 241) = —2kA and 
Fy =k(a1 — 22) = kA. So F; is upwards, F2 is downwards, and |F}| is 
twice |F|, which is what we expect by intuition. 


Substituting these equations for the forces into equations (33), we obtain 
equations of motion for the pair of masses: 


mi = k(xq — 221), 
Mz = k(x = XQ). 


This can be written in matrix form as 


Tt = k —2 1 LY 
lon Ea lbs) 4) 
or as 
x= Ax, where x = a and A = = ie 1 : (35) 
x2 m 1 -l 


This is similar in form to equation (5) of the Introduction, except that it 
involves second derivatives. In this section and the next we will see that 
there are solutions, derived from the eigenvalues and eigenvectors of A and 
analogous to equation (6), in the form of a constant vector multiplying a 
function of time. 


Among all the possible motions of this system there is a special motion, 
where the two masses oscillate with the same angular frequency w, but 
with possibly different amplitudes a, and ag. In this case the function of 
time is a sinusoidal function, and the solution typically has the form 


Bilt)! 4 ay 

is ( 3) = sin(wt + ¢) a : 
where ¢ is a constant. These special solutions are called the normal modes 
of oscillation of the system and will be discussed in Section 3. 


2 Second-order systems 


Note that the constant term in 


the series is equal to zero 


because the force is zero at the 


equilibrium: F'(0) = 0. 
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Figure 6 A bowl 


Sv 


Figure 7 ‘Typical path of a 
ball in an elliptical bowl, 
looked at from above 
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When the displacement x is small, the quadratic and higher-order 
terms can be neglected, so that F' ~ Cx and hence using Newton’s 
second law we get 


LG Ax. 


where A = C’/m, and m represents the mass of the system. If two or 
more coordinates (71, %2,...) are required to describe the 
displacement from equilibrium, then the equation of motion for the 
small displacements takes a form similar to equation (35): 


= AX (36) 


where A is a matrix andx=[z,; 22 ...]’ is a vector formed by the 


displacements. This equation describes the dynamics of most physical 
systems near equilibrium. 


A simple example of this is when a ball bearing (a small metal ball) is 
placed in a bowl, like that in Figure 6, and set in motion. In that 
case, x represents the vector displacement of the ball from the lowest 
part of the bowl, when looked at from above. And when |x| is not too 
large, the motion of the ball can be described by equation (36), where 
the coefficients of the matrix A are determined from the shape of the 
bottom of the bowl. Figure 7 illustrates a typical path that you might 
see if you looked down onto the surface of the bowl from above. 


Let us now move on to the topic of how to solve systems of second-order 
differential equations. 


2.2 Solving second-order systems 


We will consider systems of linear constant-coefficient second-order 
differential equations of the form 


x = Ax, (37) 


We will try to solve this equation in a similar manner to the way in which 
we solved the first-order case x = Ax in the previous section. 


Let v be an eigenvector of the matrix A with eigenvalue A, so Av = Av. 
For the moment we will assume \ 4 0 (the case \ = 0 will be covered 
shortly). Let us try a solution of the form x(t) = ve“ for some number p. 
Substituting into the left- and right-hand sides of equation (37), we get 

2 


d 
qave™ = Ave'*, 


so 
pve = ve. 


Thus p? = A, hence x(t) = veY™ and xi) = ve~¥* are both solutions. So 


we have the following result. 


2 Second-order systems 


For a system of differential equations 
Xe ALS 
solutions are given by 


L/S Xt 


Sve ‘ 


where X is an eigenvalue of the matrix A corresponding to an 
eigenvector v. 


The general solution is, as you may already have guessed, a linear 
combination of the solutions vetV and ve~¥™ for all eigenvalues A. We 
must consider two cases: A > 0 and A < 0. Let us consider the positive 
case first; the following example illustrates how to construct the general 
solution for a pair of simultaneous differential equations. 


Example 8 
Find the general solution of the system of differential equations 
XZ = 3x + 2y, 
{ y= xc+Ay. 


(Hint: The eigenvectors of : Hl are Vv, = H and v2 = 7] , with 
corresponding eigenvalues A; = 5 and Az = 2.) 
Solution 
The matrix of coefficients is 
3.2 

a=[ 2) 
Using the hint, it follows that vjetV™" = [1 1]?'e+V5 and 
voet Vt — [—2 1? et v2 are solutions. Hence the general solution is a 


linear combination of these: 
| = H (Crev® + Cre ¥**) + 3] (Csev™ + Cye~¥™*) ; 
where C1, Co, C3 and C4 are arbitrary constants. 


In this example, we see that two second-order differential equations give 
rise to a general solution with four arbitrary constants — four terms arise 
from the positive and negative square roots of each eigenvalue. Not 
surprisingly, a system of n second-order differential equations gives rise to 
a general solution with 2n arbitrary constants. 
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Exercise 13 
Find the general solution of the system of differential equations 
X= 5x + 2y, 
{ y = 2x + dy. 


(Hint: The eigenvectors of C are H and 3] , with corresponding 
eigenvalues 7 and 3.) 


All this is fine when the eigenvalues are positive. However, if an 
eigenvalue A is negative, then V/A = iV—A is pure imaginary, and we have 
solutions of the form 


x = Cyve'¥—** + Cove *V—™* + other eigenvalue terms. 


Hence the constants C; and C2 must be complex for x to be real. A 
similar problem occurred in Subsection 1.3, and as there, we use Euler’s 
formula to manipulate our solution into one involving sines and cosines. So 
(ignoring the other eigenvalue terms) we have 


x = Civ(cos(V—At) + isin(V—At)) + Cov (cos(vV—At) — isin(V—At)) 
= (Ci + C2)v cos(V—At) + i(Cy — C2)v sin(V—At) 
= av cos(V—At) + Bv sin(V—At), 


where a = C, + C2 and 6 = i(C — C2). Since C) and C, are arbitrary, so 
are a and @. Furthermore, if x is real, then clearly a and 6 must be real 
too. We summarise this as follows. 


If \ is a negative eigenvalue of the matrix A corresponding to an 
eigenvector v, then 


x = vcos(V—At) and x= vsin(V—At) 


are solutions of the system of differential equations x = Ax. 


Example 9 


Find the general solution of the system of differential equations 


%=x2+Ay, 
Y=ax2—2y. 
Solution 


The matrix of coefficients is 
1 4 
-[) ] 
The eigenvectors of A can be shown to be 


vi =[4 1)” with eigenvalue \; = 2, 


v2 =[1 —1]" with eigenvalue \2 = —3. 


So we have one positive and one negative eigenvalue. For the positive 
eigenvalue we get the solution 


4 
xX, = Ci il ev 2t + Cy | eo Vt. 
We could write the solution for the negative eigenvalue as 


x2 = C3 |_|] et ac |_|] e V3 


but as discussed above, this is not so convenient because then C3 and C4 
have to be complex. So instead we use the above result and write the 
solution in terms of sines and cosines: 


es 3] cos(/3t) + Cr 3] sin(V3t). 


Adding all the solutions (x; + x2), we get the general solution: 


f]- [lew ser) [ emt sc, 


where C1, Co, C3 and Cy are real constants. 


The above ideas can be formalised in the following procedure, which also 


tells you what to do when an eigenvalue of the matrix of coefficients is zero. 


Procedure 4 Solving a second-order homogeneous linear 
system 


To solve a system x = Ax, where A is an n X n matrix with 
n distinct real eigenvalues, do the following. 


1. Find the eigenvalues 1, A2,...,An of A, and a corresponding set 
of eigenvectors V1, V2,---,Vn- 


2. Each positive eigenvalue 4, corresponding to an eigenvector v, 
gives rise to two linearly independent solutions 


Jrt Jat 


ve and ve 


Each negative eigenvalue , corresponding to an eigenvector v, 
gives rise to two linearly independent solutions 


vcos(V—At) and vsin(V—At). 


A zero eigenvalue corresponding to an eigenvector v gives rise to 
two linearly independent solutions 


v and vt. 


3. The general solution is then an arbitrary linear combination of 
the 2n linearly independent solutions found in Step 2, involving 
2n arbitrary real constants. 


2 Second-order systems 


Complex eigenvalues and 
repeated real eigenvalues are not 
discussed here, but they can be 
dealt with by generalising what 
we have discussed. 


We do not prove this here, but 
you can verify it in any 
particular case (see Example 10 
below). 
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You may like to verify that 
[—2 2 1)? and[-—2 2 1)%¢ 
are both solutions of the system. 
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We illustrate this procedure in the following example. 


Example 10 


Find the general solution of the system of differential equations 


G= 3x + 2y 4+ 2z, 

y = 2x + 2y, 

Z= 24 + 4z. 
Solution 


The matrix of coefficients is 


3.2 2 
A=]2 2 0 
2 0 4 


The eigenvectors of A can be shown to be 
[2 1 2)’ with eigenvalue \ = 6, 
[1 2 -2]” with eigenvalue A = 3, 
[-2 2 1]? with eigenvalue \ = 0. 


It follows from Procedure 4 that the general solution of the system is 


x 2 1 
y| = Il (Cyev™ + Coe ¥*) + 2 (Cze¥* + Ce V4) 
z 2 —2 
—2 
ele 2} (Cs + Cet). 
1 


Exercise 14 
Find the general solution of the system of differential equations 


Z£=22+ y- 2, 


f= By +22, 
LZ Az. 
2 to =I 1 1 —5 
(Hint: The eigenvectors of |0 —3 2] are {0}, |—5| and | 4], with 
0 O 4 0 0 14 


corresponding eigenvalues 2, —3 and 4.) 


In the next example and exercise, we first find the general solution of a 
system of second-order differential equations, then find a particular 
solution for given initial conditions. This will be relevant for Section 3 on 
normal modes. 


Example 11 
(a) Find the general solution of the system of differential equations 
L=—-3x+ y, 
{ Y= «By. 


(Hint: Eigenvectors of e = can be shown to be 
vi =[1 1)” with eigenvalue \2 = —2, 
v2=[1 —1]" with eigenvalue \; = —4.) 


(b) Find the particular solution that satisfies the initial conditions 
x(0) = v1, x(0) = O, where v; is the eigenvector given in the hint. 


Solution 


(a) Using the hint, we see that there are two negative eigenvalues. The 
first eigenvalue gives the term 


x = Cj H cos(V2t) + Cy H sin(V2t). 
The second eigenvalue gives the term 
x2 = C3 1] cos(2t) + C4 3] sin(2t). 
Adding the solutions (x; + x2), we get the general solution: 


xt) = H (C; cos(V2t) + C2 sin(V2t)) 


os |_|] (C3 cos(2t) + Cy sin(2t)). 
(b) Setting t = 0 in the general solution gives 
_ 1 1 _ Ci + C3 
wo-af) ol E24 
Using the initial condition x(0) = v; then gives 
1 = Ci +C3 
1] |C, — C3} ° 
Hence C; + C3 = 1 and Cy — C3 = 1, which have solution C3 = 0, 
C= L 


Now, differentiating the general solution with respect to t, we get 
x(t) = V2 H (—C; sin(V2t) + C2 cos(V2t)) 
+2 1] (—C3 sin(2t) + Cy cos(2t)), 
and then setting t = 0 and using the initial condition x(0) = 0 gives 


x(0) = 0 = V2C H + 2Cy |_|] 


2 Second-order systems 
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This gives /2C2 + 204 = 0 and V2C, — 2C, = 0, which have solution 
Cp = Cy = 0. 


So we have Cp) = C3 = Cy = 0 and C, = 1. Substituting into the 
general solution gives 


a= H cos(V2t). 


Clearly this satisfies the initial conditions. 


Exercise 15 


For the system in Example 11, find the particular solution that satisfies 
the initial conditions x(0) = v2, x(0) = 0, where v2 is the eigenvector 
given in the hint. 


The particular solutions found in Example 11(b) and Exercise 15 are called 
normal mode solutions and are the topic of the next section. 


3 Normal modes 


In Subsection 2.1 we described a compound harmonic oscillator consisting 
of two masses and two springs (see Figure 5). We mentioned that the 
equations of motion for this system have special solutions, called normal 
modes, in which the two masses oscillate at the same frequency. Here we 
show (in Subsection 3.1) how these normal mode solutions are obtained in 
a simple system, and how they are combined to describe the general 
solution of the equations of motion. 


In Subsection 3.2 we consider an important scientific application of these 
ideas, by applying normal modes to describe the oscillations of certain 
types of simple molecule. These oscillations help scientists to explain how 
the Earth’s atmosphere traps so much of the heat from the Sun, due to the 
greenhouse effect. 


3.1 Motion of a simple two-mass system 


An oscillating system and its equations of motion 


We begin by describing the equations of motion for a rather special 
oscillating system. 


Consider a system of two particles with equal masses m moving without 
friction on a rail, connected to three springs as illustrated in Figure 8. The 
two outer springs have spring stiffness K and are connected to rigid 
supports at either end. The middle spring has stiffness k. The 
displacements of the masses to the right of their equilibrium positions are 
x1 and £2 (for the left and right masses, respectively). 


3 Normal modes 


Figure 8 ‘Two masses slide without friction on a rail; the force on each 
mass is provided by two springs 


The equations of motion for this system are derived from Newton’s second 
law and can be shown to be 


t1 _ —(k + K) k XY 
ue is a k —(k + Kk) r2|° (38) 
We do not expect you to be able to derive such equations — they will be 
given to you if needed in assessment questions. However, the following is 


an argument that makes them plausible. (Don’t worry if you can’t follow 
this argument; it is not important for what follows). 


Forces and displacements are 
normally written as vectors. But 
as the motion is in one direction 
here, we simply consider the 
components of the forces and 
displacements in that direction. 


We now move on to solving these equations of motion. 


Solving the equations of motion 


Equation (38) is a system of equations of the form x = Ax, where the 
matrix of coefficients is 

1 |=) k 

m k —(k+ K)}° 


From Section 2 we know that solutions are constructed using the 
eigenvalues and eigenvectors of A. 


A= 


You studied how to calculate eigenvectors and eigenvalues in Unit 5. The 
method is straightforward but a little tedious, so we will simply tell you 
what they are, and let you verify for yourself that they satisfy Av = Av. 
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The eigenvectors are 


=f = LI) » 


and the corresponding eigenvalues are 


K K+42k 
aoe, Yee (40) 


m m 


Note that the components of both eigenvectors v = [v1 vg] satisfy 
|v1| = |ve|. This is because the system in Figure 8 has a reflection 
symmetry about its midpoint, i.e. the symmetry of the system is 
linked to the solution to the eigenvalue problem. 


We note that because K, k and m are all positive, both of the eigenvalues 
are negative. Hence from Procedure 4, the solutions of x = Ax are of the 
form 


x(t) = cos(@it) vq, x<G@) = simi) va 


W=V — yj. (41) 
The general solution is a linear combination of all of these solutions. We 
have four solutions, two for each eigenvalue: 


x(t) = (C1 sin(wt) + Dj cos(wit))v1 
+ (C2 sin(w2t) + De cos(w2t))va, (42) 


where Cy, D,, Cg and D» are arbitrary real constants. 


Normal modes of vibration 


We note that the general solution, equation (42), can be written as the 
sum of two terms 


x(t) = Xu, (t) + Xun(E), 

where 
Xw, (t) = (Ci sin(wit) + D1 cos(wit)) v1, 
Xw.(t) = (C2 sin(wet) + D2 cos(wet)) ve. 


X(t) is the part of the solution that oscillates with angular frequency w}, 
and x,,,(t) is the part of the solution that oscillates with angular 
frequency wa. These are called the normal modes of oscillation of the 
system. 


If we were to set Cp = D2 = 0 in the general solution, then x(t) = x,,, (¢), 
and we would have a solution where both of the masses oscillated with the 
same angular frequency w,. Likewise, if we set Cy = D,; = 0, then both 
masses would oscillate with angular frequency wo. In fact, this defines 
what we mean by a normal mode for a system with any number of 
dependent variables. 


3 Normal modes 


Normal mode 


A normal mode of oscillation is one in which all of the coordinates of 
the system oscillate sinusoidally with the same angular frequency. 


Suppose that we were to choose initial conditions so that x(t) = x,,, (t). 
Then since v; = [1 1]’, we have 


aa = Xu, (t) = (Ci sin(wit) + D; cos(w1t)) H | 


hence the displacement of the masses would obey the equations 


a1(t) = x2(t) = (Cisin(wit) + Di cos(wit)) for x(t) = xu,(t). (43) 


There are two coefficients, C; and D,, that are determined by the initial In fact, C, and D, determine 
conditions. We solved a system just like this, for the special case of m= 1, the amplitude and phase of the 
= 1, K = 2, in Example 11. In part (b) we found that the initial oscillation. 


conditions x(0) = v1 and x(0) = 0 lead to the normal mode solution 
x(t) = cos(w t) v;. In fact, these initial conditions give this normal mode 
solution for any values of m, k and K. 


Likewise, if we chose initial conditions so that x(t) = x,,,(t), then since 
v2 =[1 —1]", the displacements of the masses would obey the equations 


v(t) = —r9(t) = (C2 sin(w2t) + D2 cos(wet)) for x(t) = x.,(t). (44) 


Again, the two coefficients, C2 and D2 are determined by the initial 
conditions. For example, the initial conditions x(0) = v2 and x(0) = 0 lead 
to the normal mode solution x(t) = cos(w2t) vg: see Exercise 15 for the 
special case of m=1,k =1, K =2. 


The normal mode with frequency w; (equations (43)) is called the 
symmetric or in-phase mode of oscillation, because the masses always 
move in the same direction together. In this case 71(t) = x2(t), hence the 
displacements of the masses are always equal. The normal mode with 
frequency wz (equations (44)) is called the antisymmetric or 
out-of-phase mode, because the masses are always moving in opposite 
directions. In this case 71(t) = —x2(t), hence the displacements are always 
of equal magnitude but opposite sign. The motion of the masses in the two 
normal modes is illustrated in Figure 9. 


—_—_ —_—_ 
= o-pha 
<— —_— 


a Si out-of-phase 


Figure 9 Normal modes for the symmetric two-mass system 
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From equations (40) and (41), the angular frequencies of the normal modes 
are given by 


K Kk+2k 
wi =4/— (in-phase) and w= 4/ is (out-of-phase). (45) 
m m 


Note that for the in-phase normal mode solution we have x1(t) = x(t), 
hence x(t) — x9(t) = 0, so the length of the middle spring never changes. 
That is why the frequency of the in-phase normal mode, we, is the same as 
that for a simple harmonic oscillator with a spring stiffness equal to K. 
Also note that since K, k and m are all positive, wo > w1, so the 
out-of-phase mode oscillates with a higher frequency than the in-phase 
mode. This is because each mass is being pulled by two springs instead of 
one as in the in-phase case. 


The motion of each normal mode is a simple sinusoidal oscillation with a 
single angular frequency. Only special initial conditions like the ones given 
in Example 11 and Exercise 15 give rise to such normal mode motions. For 
arbitrary initial conditions (e.g. x(0) = [1 2)”, x(0) = [0 1]”) the 
solution is a combination of sinusoidal oscillations with two different 
frequencies. In other words it is a linear combination of normal mode 
solutions, where the displacements of the two masses move in a seemingly 
complicated manner and are not proportional to each other. 


UA 
> 
t 
es In-phase 
normal mode 
> 
t 
LA 
> 
t 
Out-of-phase 
TQ normal mode 
> 
t 
Ly 1A 
> 
¢ Example of general 
ary A non-normal mode 
motion 
> 
t 


Figure 10 General non-normal mode motions are a combination of 
oscillations at the frequencies of the normal modes 


3 Normal modes 


Typical motions for these cases are illustrated in Figure 10. Note that 
these normal modes are rather special, since the in-phase mode has 

x1(t) = x9(t) and the out-of-phase mode has x(t) = —x2(t). This special 
motion arises because of the reflection symmetry of the system — the 
masses and spring constants on either side are equal. For m, 4 m2 or 
different springs on either side, there are still two normal modes, where 
both masses oscillate with the same frequency and 21(t) = a x(t), for 
some number a. The in-phase mode is characterised by @ being positive, 
and the out-of-phase mode is characterised by a being negative. The 
angular frequency of the out-of-phase normal mode is always greater than 
that of the in-phase normal mode. 


Example 12 

A system of two masses has its motion given by the system Note that this does not 
7 correspond to a symmetric 
Ey} _ —2 2] [x1 two-mass system connected by 
L2 - —6| |xo|° springs. 


(a) Write down the normal mode solutions and identify them as in-phase 
or out-of-phase. 


: —2 2 : va 
(Hint! A=| 5 P has eigenvectors v; = [2 1]° and 
_ 
v2 =[-2 1)", with corresponding eigenvalues A; = —1 and Az = —7.) 


(b) Which motion will initial conditions x(0) = v1, x(0) = 0 give rise to? 

Solution 

(a) The matrix of coefficients has negative eigenvalues. The first 
eigenvalue gives the term 


ye H (C1 cos(t) + Cy sin(t)), 


where C, and C2 are arbitrary real constants. This is the in-phase 
normal mode solution, because the components of v; have the same 
sign. The second eigenvalue gives the term 


x2 = Ba (C3 cos(V7t) + Cysin(V7t)), 


where C3 and C4 are arbitrary real constants. This is the out-of-phase 
normal mode solution, because the components of v2 have opposite 
signs. 


We note that the angular frequency of the out-of-phase normal mode 
(Wout = V7) is greater than the in-phase normal mode (win = 1), as 
expected. 


(b) Following the same reasoning as in Example 11 and Exercise 15, these 
initial conditions give rise to the normal mode solution 


x(é) =v; coalty = H cos(t). 


It is obvious that this solution satisfies the given initial conditions. 
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Exercise 16 


A system of two masses connected by springs has two normal mode 
angular frequencies: w,; = 2 and wo = 4. Which is the in-phase frequency? 


Exercise 17 


A system of two masses connected by springs has a characteristic matrix 
with eigenvectors v; = [1 —2]7 and v2=[1 4)’. Which gives rise to the 
in-phase mode? 


Exercise 18 


Two masses connected by springs have their motion given by the system 


e)-[2 -] El. 


(a) Write down the normal mode solutions and identify them as in-phase 
or out-of-phase. 


(Hint: A= E i has eigenvectors v1 = [1 1]? and 
v2 =[1 —1]", with corresponding eigenvalues \; = —3 and Az = —5.) 


(b) Which motion will the initial conditions x(0) = vi, x(0) = 0 give rise 
to? 


In structural engineering, it is imperative that a building’s normal 
mode frequencies do not match the frequencies of expected 
earthquakes, otherwise an earthquake may make the structure 
vibrate, causing damage. The periodic variation of wind gusts can be 
another cause of resonant vibration in structures like bridges or very 
tall buildings, and measures have to be taken to avoid the vibrations 
becoming too large. For example, the 509 m tall Taipei 101 building 
in Taiwan (the tallest in the world until 2010), shown in Figure 11, 
has a 660 tonne steel pendulum suspended from its 92nd floor to 
dampen resonant vibrations caused by wind gusts. 


3.2 An application: vibrations of simple molecules 


This optional subsection discusses one of the most important physical aah 
building 


applications of normal modes. You will not be assessed on any of 
the material in this subsection. 


Simple molecules consist of a small number of atoms held together by 
‘chemical bonds’. Figure 12 illustrates some simple molecules. You can 
think of the atoms as behaving like point masses. The chemical bonds are 
not entirely rigid: they can behave like springs, so that the molecule can 
oscillate. We end this unit by considering a model for the frequency of 
oscillation of some simple molecules. 


Water, H2O Carbon dioxide, CO2 
O OC mwom@ 
O C O 
H H 
H 


Methane, CHy 


H 


Figure 12 Some simple molecules, modelled as masses connected by 
springs 


3 Normal modes 


Figure 11 The Taipei 101 
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The frequencies of oscillation of molecules depend on their geometry and 
on the masses of the atoms and the ‘spring stiffnesses’ of the chemical 
bonds. For most molecules, the normal modes of vibration form a 
complicated three-dimensional motion. In this subsection we consider only 
the vibrations of a simple type of molecule, called a linear triatomic 
molecule. Carbon dioxide is an example of such a molecule. Its small 
oscillations can be of two types. There are bending vibrations, and there 
are motions that involve stretching of the chemical bonds while the atoms 
move along the same line (see Figure 13). In this subsection we confine our 
attention to the stretching motion, and we do not consider the bending 
modes of vibration. 


( Bending 
(not considered here) 


Om @ew-@ Symmetric stretch 


<— =—S-F 
QW D-WH-G Asymmetric stretch 
——_> <—— —_ 


Figure 13. Modes of vibration of carbon dioxide 
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Our model for the carbon dioxide molecule consists of three atoms in a 
row, with masses m, M and m. Each of the outer masses is connected to 
the central mass by a ‘spring’ with stiffness constant k. The displacements 
of the three masses from their equilibrium positions are 71, ©2 and 23, 
respectively, as shown in Figure 14. The forces due to the ‘springs’ can be 
shown to be Fi = k(x2 — 21), Fy = k(x, — x2) + k(axz — x2) and 

F3 = k(aq — x3), so that the equations of motion are 


MX = k(x = fi): 
Min = k (ay — 2%o+ x3), 
MxX3 = k(x = r3). 


This system is equivalent to the matrix equation 


k k 
m m %O 
ee = k k k 
x = Ax, Ne —F “Al 
k k 
a aan 


Once again the general solution and normal modes are determined from 
the eigenvalues and eigenvectors of the coefficient matrix A. It is not 
difficult to show that the characteristic equation is 


k kook 
det(A — a0) =A (A+ =) (+5425) =0. 


From this we deduce that the eigenvalues are 


A; and A» are both negative, so they correspond to oscillations or 
vibrations of the molecule with frequencies 


_ /[k q — [k /M+2m 
C4 and w= P| 7 ae 


You are no doubt wondering what sort of motion the zero eigenvalue A3 
corresponds to. It turns out that its eigenvector is v3 = [1 1 1)”. This 
has a simple physical interpretation: it corresponds to a motion where all 
of the atoms move together in the same direction by the same amount, so 
the motion does not stretch either of the springs. 


We now return to consider the modes of vibration. The vibration at 
frequency w; corresponds to the eigenvector [1 0 —1]", for which the 
motion is a symmetric stretch. The frequency wz has eigenvector 

[1 -—2m/M 1)", and corresponds to an asymmetric stretch normal mode 
(see Figure 13). 


Consider how this works for carbon dioxide. The masses of atoms are 
measured in units of the mass of a hydrogen atom, my. Carbon dioxide 
(chemical symbol CO) is a symmetric linear triatomic molecule, with two 
oxygen atoms, each of mass m = 16my, and a carbon atom, of mass 

M = 12my. 


3 Normal modes 


L1 LY “3 


Figure 14 Displacement of 
atoms from their equilibrium 
positions 


As in earlier models, these 
equations are actually the 
components of the vector 
equations for forces and 
displacements, in the direction 
of motion. 


203 


Unit 6 Systems of linear differential equations 


The ratio of the two stretching vibrational frequencies is therefore 
expected to be 


wo [M+2m_ [44 
wy Ye. oi 


The frequencies can be investigated by looking for resonances in the 

absorption of carbon dioxide as the frequency of a light source is varied. In 
To convert units of cm~! to a unit favoured by spectroscopists (em~'), the frequencies of vibration of 
Hertz, multiply by the speed of — carbon dioxide molecules are found to be 667, 1388 and 2349. The smallest 
light, ¢~ 3 x 10° cms~!. of these is the frequency of bending vibrations, and the other two 

correspond to the two stretching vibrations. The ratios of the observed 

stretching vibration frequencies is 2349/1388 ~ 1.7. This is quite close 

to 1.9, indicating that the simple mass and spring model is a reasonable 


For a better model we need to model for the vibrations of the CO2 molecule. 
turn to quantum mechanics and 

solve the Schrédinger equation 

for carbon dioxide. 


Learning outcomes 


After studying this unit, you should be able to do the following. 


e Understand and use the terminology associated with systems of linear 
constant-coefficient differential equations. 


e Obtain the general solution of a homogeneous system of two or three 
first-order differential equations, by applying knowledge of the 
eigenvalues and eigenvectors of the coefficient matrix. 


e Obtain a particular integral of an inhomogeneous system of two 
first-order differential equations in certain simple cases, by using a 
trial solution. 


e Obtain the general solution of an inhomogeneous system of two or 
three first-order differential equations, by combining its 
complementary function and a particular integral. 


e Apply given initial conditions to obtain the solution of an initial-value 
problem that features a system of two or three first-order differential 
equations. 


e Obtain the general solution of a homogeneous system of two or three 
second-order equations, by applying knowledge of the eigenvalues and 
eigenvectors of the coefficient matrix. 


e Obtain a particular solution of a pair of second-order homogeneous 
equations, by applying initial conditions. 


e Identify normal mode solutions for systems of two masses. 
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Solutions to exercises 
Solutions to exercises 


Solution to Exercise 1 


Z 2 1} |x 1}. 
(a) k = fh A i | + 3 inhomogeneous. 
(b) l= f i i |+ Ht inhomogeneous. 
x 5 O Of} Ja 
(c) |y} = ]|1 2 1] |y]; homogeneous. 
z 1 1 2) Jz 


Solution to Exercise 2 
(a) The matrix of coefficients is 
3 2 
a= [2 4. 
Using the given eigenvalues and eigenvectors, we can construct two 
independent solutions ve: 


—2 
Xj, = H e* and x9= d a. 


(b) The general solution is a general linear combination of the two 
independent solutions: 


slaollesl a 


where qa and £ are arbitrary constants. 


Solution to Exercise 3 
The matrix of coefficients is 
5 2 
A=[} 5): 
We are given that the eigenvectors of A are [1 1]” with corresponding 


eigenvalue \ = 7, and [1 —1]? with corresponding eigenvalue \ = 3. The 
general solution is therefore 


lee b]e eae 


where qa and # are arbitrary constants. 
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Solution to Exercise 4 


(a) In Exercise 3 we showed that the general solution of these differential 
equations is 


feeble +L al 


Since « = 4 and y = 0 when t = 0, we have 


4=a+ 8, 
eee 


Thus a = 2 and 6 = 2, so 
2 1| 7 1] 3¢ 
Peet +e 
-fle-Le 


(b) For large t, e” is much greater than e*’, hence x ~ 2e” and y ~ 2e”. 
Hence the solution approaches the line x = y for large t. 


Solution to Exercise 5 


The matrix of coefficients is 


5 0 0 
A=]1 2 1 
1 1 2 


We are given that the eigenvectors of A are [2 1 1]7, [0 1 1]? and 
[(0 1 —1]*, corresponding to the eigenvalues 5, 3 and 1. 


The general solution is therefore 


x 
Jeali| aelt 
ie ‘eh ‘dead bile 


Since x = 4, y = 6 and z = 0 when t = 0, we have 


4 = 2a, 
C= Gers 
0= a+8-y. 
From this we can deduce that a = 2, G=1 and y= 3, so 
x 2 0 0 
yp) Ho 11) eF + |1)e* 43) ie 
Zz 1 1 —1 
4 0 0 
= (2) 28" 4: 11) er a) e 
2 1 —3 
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Solution to Exercise 6 
(a) If z= 4+ 37, then Z = 4 — 32, which gives 7 = 4 + 33. 
Hence Z = z. 
(b) If z= 4+ 32, then Rez = 4. On the other hand, 7 = 4 — 3%, which 
gives $(z +2) = $((44+ 3%) + (4— 34)) =4. 
Hence Rez = $(z +2). 
(c) If z= 4+ 32, then Im z = 3. On the other hand, 
x(z— 2) = #((44 32) — (4- 31)) = $ =3. 


Hence Im z = +(z— 2). 
Solution to Exercise 7 


(a) The matrix of coefficients is 


—3 -2 
A= ; i | 
Using the given eigenvalues and eigenvectors, we obtain the general 
solution 
Ly 1 (—1+24)¢ i (—1-2%)t 
Fl=¢|_.2ie + D 144° : 
Now 


1 e(—1+2i)t _ e~*(cos 2t+7sin 2t) 
~ |(-1— i)e~*(cos 2t + isin 2t) 
e~' cos 2t + te sin 2t 
e~'(sin 2t — cos 2t) — ie~*(sin 2t + cos 2¢) | ° 


1 (—1+2i%)t _ cos 2t —t 
re(|_,1J¢ ~ |sin 2¢—cos2t] © ’ 
1 (—1+2i)t = sin 2t —t 
im({_y* Je ~ |—sin2t—cos2t|~ ° 
and the general real-valued solution can be written as 


rl cos 2t _t sin 2t “4 
Hl — le 2t — cos x ane [_ sin 2t — cos ‘| -? 


(b) Since x = y = 1 when t = 0, we have 
l=a and 1=-a-§6, 


so a= 1, 6 = —2, and the required particular solution is 


x} |cos2t—2sin2t| _+ 
y| | 3sin2t + cos 2t == 


Solutions to exercises 
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Solution to Exercise 8 


(a) Using the given eigenvalues and eigenvectors in Procedure 2, we have 


1 1 
vent = | i] etot = ot | 3 (cost +7sint) 
a 1 


cost +7sint 
=e’ | sint —icost 
—sint+icost 


cost sint 
=e’ | sint | +i et |—cost 
—sint cost 
real part imaginary part 


Thus the general real-valued solution is 


“rp 0 cost sint 
y| =C, |1| e+ Coe’ | sint | + Cae’ |—cost| , 
z 0 —sint cost 


where C,, C2 and C3 are real constants. 


(b) Putting x = y =1 and z = 2 when t = 0, we have 


1 0 1 0 
|) Cy 1 Oy 10) Os Ls 
2 0 0 1 


so Cp = 1, C3 = 2 and C, — C3 = 1, giving C, = 3. Thus the required 
solution is 


0 cost + 2sint 
y| = |3| e+ |-2cost+sint] e*. 
0 2cost — sint 


Solution to Exercise 9 
vy) 1] st =2 9 
k =a H er +6 | ew + 


Solution to Exercise 10 


4 

t+ 

ells 
24-7 


From Example 1, the complementary function is 


rl=el] +e [ale 


For a particular integral, we try 


sl [etd, 


where a, b, c, d are constants to be determined. 
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Substituting « = at + 6, y = ct +d into the differential equations gives 
i. (at + 6) + 4(ct +d) —t4+2, 
c= (at +b) — 2(ct + d) + 5t, 
which become 
(a+ 4c—1)t+ (0+ 4d+ 2-4) =0, 
ea ee der =), 
Equating the coefficients of t to zero gives 
{ a+4c—1=0, 


a—2c+5=0, 
which have the solution a = —3, c= 1. 
Equating the constant terms to zero, and putting a = —3, c= 1, gives 
{ b+4d+5=0, 
b—2d—1=0, 


which have the solution b = —1, d=~—1. 


Thus the required particular integral is 


yf) 


and the general solution is 


Hl - Bl al 


Solution to Exercise 11 


The complementary function is 


rl=el] +e Lale 


For a particular integral, we try 


f= [le 


where a and 6 are constants to be determined. 


t t 


, y = be into the differential equations gives 
{ —ae* = ae—* + 4be* + de®, 


—be—t = ae~* — 2be* + 5e-#, 


Substituting « = ae~ 


which, on dividing by e~' and rearranging, become 


2a + 4b = —4, 
a-— b=-5. 
These equations have the solution a = —4, b= 1. 


Solutions to exercises 
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Thus the required particular integral is 


Fela 


and the general solution is 


fl) - Gel+ 


Solution to Exercise 12 


(a) The matrix of coefficients is 


2 3 
AS F Hl | 
First, we find its eigenvalues. The characteristic equation is 
2-r 38 
det(A — AT) = 9 1_ ‘ =0. 


Expanding this gives (2 — A)(1 — A) — 6 = 0, which simplifies to 
? — 3\—4=0. This factorises as (\ — 4)(A + 1) = 0, so the 
eigenvalues are \ = —1 and A = 4. 


Now we find the eigenvectors. 


e For \=~—1, let the eigenvector be v=[z y]’. Then the 
eigenvector equations are 


(A—A)v =(A+Dv= k "| =0. 


These give the simultaneous equations 


30+ 3y=0 and 27+ 2y=0, 


which reduce to the single equation y = —x. So (setting x = 1), 


an eigenvector corresponding to A= —1 is {1 —1]’. 


e For \ = 4, the eigenvector equations are 


(A —Al)v =(A—4Dv = & E "| =0, 


which give 


—274+3y=0 and 2x- 3y=0, 


which reduce to the single equation y = 2x7/3. So (setting x = 3) 


an eigenvector corresponding to A = 4 is [3 2)”. 
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(b) Using the calculated eigenvalues and eigenvectors, the complementary 
function is 


We try a particular integral 


i= [ 
Then 
2ae”* = 2ae** + 3be# + e7¢, 
{ 2be?* = 2ae** + be** + 4e?*, 


which give 3)+ 1=0 and b— 2a=4, so b= -$ and a= —%. The 
general solution is therefore 


fe) =e [ape t+a |g] et-a [5] ee 


Putting t = 0, we obtain 
5 is. 2 1 
ga O+38- Ge, 3=-a+28—5, 
soa+36=83 and —a + 28 = 1, which give a = 2 and B= 3. 


The required solution is therefore 


T) 3 1 —t 4 |3 4t 1 13 ot 
P= Lae +ebfe-a [a] 


(c) As t gets large, the e** term will become larger than any of the other 
terms. Hence 


x hay A 3 At 
" Se A efor large t. 


e and y = $e". Hence «/y = 2 = 3, thus the solution will 


Soz= . 


approach the line x = 3y/2 for large t. 
Solution to Exercise 13 
The matrix of coefficients is 
5 2 
seb g: 
and we are given that the eigenvectors are [1 1]” corresponding to the 
eigenvalue \ = 7, and [1 —1]? corresponding to the eigenvalue A = 3. 


It follows that the general solution is 


Hl = H (Crev™ + Cze-¥*) + |_|] (Cse¥™ + Cye“¥*) : 


Solutions to exercises 
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Solution to Exercise 14 


Using the given eigenvalues and eigenvectors, we obtain the general 


solution 
© 1 1 
y| = |0 Ge + Gye") + |—5] (C3 cos(V3t) + C4 sin(V3t)) 
z 0 0 
—5 
+ 4 (Cse”* + Coe"), 
14 


Solution to Exercise 15 


In the solution to Example 11(b), we used a rather pedestrian approach to 
finding the particular solution. Let’s do this a bit more smartly this time. 
First, write the general solution as 


x(t) = vi (Ci cos(V2t) + C2 sin(v2t)) + v2(C3 cos(2t) + Cy sin(2t)). 
Setting t = 0 and using the initial condition x(0) = v2 gives 

x(0) = vo = Civi + C3va. 
Clearly this has solution C; = 0 and C3 = 1. 
Now, differentiating the general solution with respect to t gives 

x(t) = V2v1 (—C) sin(V2t) + C2 cos(V2t)) 

+ 2v2(—C3sin(2t) + C4 cos(2t)), 

and using the initial condition x(0) = 0 gives 

0 = V2Cov1 + 2CrVve. 


This clearly has solution C2 = C4 = 0. Substituting these values in the 
general solution gives 


x(t) = v2 cos(2t) = 3] cos(2t). 
Clearly this satisfies the initial conditions. 


Solution to Exercise 16 


W2 > Wy, SO We is out-of-phase and wy is in-phase. 


Solution to Exercise 17 


Since the components of vj have the same sign, this must give rise to the 
in-phase mode. 
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Solution to Exercise 18 


(a) The matrix of coefficients has negative eigenvalues. The first 
eigenvalue gives the term 


xX, = H (Cy cos(V3t) +f C2 sin(V3t)), 


where C; and C are arbitrary real constants. This is the in-phase 
normal mode solution. The second eigenvalue gives the term 


re FF (C3 cos(V5t) + Cy sin(V5t)), 


where C3 and C4 are arbitrary real constants. This is the out-of-phase 
normal mode solution. 


(b) Following the same reasoning as in Example 11 and Exercise 15, these 
initial conditions give rise to the normal mode solution 


x(t) = v1 cos(V3t) = H cos(V3t). 


It is obvious that this solution satisfies the given initial conditions. 


Solutions to exercises 
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